Data analysis using SQL Server

Question

Post reply

Data analysis using SQL Server

mctaff

Valued Member

Points: 59
More actions
January 13, 2010 at 12:02 pm

#214506

Hello!
I'm fairly new to SQL server, and trying to understand the best method to apply some data analysis.
I have a medium size data set 100000+ rows, consisting of date sequential financial data - historical prices over a date range.
I intend to process these row by row using C++, but wish to pre prepare the data by adding extra columns - my assumption being doing this pre processing up front will speed up the sequential data processing.
I'm interested in adding columns containing averages, minimums and maximums over a set period (eg row x would have an average column containing the average of the previous 350 prices, or the maximum column would contain the maximum price out of the previous 50 etc).
I'm using the following code to provide the averages, but it takes a long time (10 mins ish) to select the full data set. I have indexed the table on RowNumber. I'm also a little unsure as to how I combine this with an update to update the appropriate column...
select t1.RowNumber, t1.[close], AVG(t2.[close]) ma_slow
from #temp t1, #temp t2
where t1.RowNumber between t2.RowNumber and t2.RowNumber + 349 and t1.RowNumber between 350 and 100000
group by t1.RowNumber, t1.[close]
order by t1.RowNumber
Any ideas on speeding this process up, or am I better extracting the data into C++ layer and doing the calculations there?
Interestingly, if I limit the RowNumber to 10000, it processes in a few seconds... unsure why having 10 times the data means 1000 times the processing time!
Excel takes about 2 seconds to process the calculation on 100000+ rows(!), but I don't want to do it by hand, and may end up with more data than I can open in Excel...
Thanks,
David.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

sturner SSC-Insane Points: 22411 More actions · Answer 1

given that RowNumber is an interger, why couldn't you simply do it this way:

select RowNumber/350 as ID, avg(close) as average, max(close) as maximum

from your_table

where RowNumber between 350 and 100000

group by RowNumber/350

order by RowNumber/350

PS if you need to use joins use the proper JOIN syntax because it is predictable. Plus the old style join syntax is deprecated.

The probability of survival is inversely proportional to the angle of arrival.

mctaff Valued Member Points: 59 More actions · Answer 2

Thanks for the idea, but that appears to average a single value, which gives the value, not an average of the previous 350 values... or am I missing something?

sturner SSC-Insane Points: 22411 More actions · Answer 3

actually, you are probably right... my brain was working in a different direction. However I think you could use the modulo 350 idea to generate a virtual table of RowNumbers to use to JOIN with using RowNumber between X and Y

something like this:

select Z.rownum, max(close), avg(close)

from maintable A

join (select X.J as rownum, min(X.K) as start, max(X.K) as [end] from (select convert(int,((RowNumber/350)*350)) as J, RowNumber as K from maintable) X group by X.J) Z on A.RowNumber between Z.start and Z.end

group by Z.rownum

The probability of survival is inversely proportional to the angle of arrival.

mctaff Valued Member Points: 59 More actions · Answer 4

Again, not sure that would work as I need every single row to have a different index...

row 351 has a average of rows 1 to 350

row 352 an average of rows 2 to 351

if sets of 350 rows have the same index I don't see how I can achieve this...

sturner SSC-Insane Points: 22411 More actions · Answer 5

Its just the genesis of an idea... to generate a table having columns startRowNum and endRowNum based on modulo-350.

It may be better to create a temp table and use identity() to create a unique rowid from 1 - 100000/350 with columns of start and end values (350 apart). The table Z might look like this:

ID start end

1 0 349

2 350 699

then:

select Z.ID, avg(close), max(close)

from maintable A

JOIN Z on A.RowNumber between Z.start and Z.end

group by Z.ID

The probability of survival is inversely proportional to the angle of arrival.

mctaff Valued Member Points: 59 More actions · Answer 6

I may have found something interesting here:

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=93911

but yet to test...

gints.plivna Ten Centuries Points: 1256 More actions · Answer 7

mctaff (1/13/2010)
row 351 has a average of rows 1 to 350
row 352 an average of rows 2 to 351

As soon as SQL Server will implement full windowing clause (including sliding windows) in their ranking functions as specified by SQL standard and for example implemented in Oracle 😉 it will be very easy 🙂

Gints Plivna
http://www.gplivna.eu

mctaff Valued Member Points: 59 More actions · Answer 8

hmmm... doesn't help me much though!

I've adapted the code from the link I posted.... 100,000 rows processed in 20 seconds for a moving average of 350 rows...