Data analysis using SQL Server

  • Hello!

    I'm fairly new to SQL server, and trying to understand the best method to apply some data analysis.

    I have a medium size data set 100000+ rows, consisting of date sequential financial data - historical prices over a date range.

    I intend to process these row by row using C++, but wish to pre prepare the data by adding extra columns - my assumption being doing this pre processing up front will speed up the sequential data processing.

    I'm interested in adding columns containing averages, minimums and maximums over a set period (eg row x would have an average column containing the average of the previous 350 prices, or the maximum column would contain the maximum price out of the previous 50 etc).

    I'm using the following code to provide the averages, but it takes a long time (10 mins ish) to select the full data set. I have indexed the table on RowNumber. I'm also a little unsure as to how I combine this with an update to update the appropriate column...

    select t1.RowNumber, t1.[close], AVG(t2.[close]) ma_slow

    from #temp t1, #temp t2

    where t1.RowNumber between t2.RowNumber and t2.RowNumber + 349 and t1.RowNumber between 350 and 100000

    group by t1.RowNumber, t1.[close]

    order by t1.RowNumber

    Any ideas on speeding this process up, or am I better extracting the data into C++ layer and doing the calculations there?

    Interestingly, if I limit the RowNumber to 10000, it processes in a few seconds... unsure why having 10 times the data means 1000 times the processing time!

    Excel takes about 2 seconds to process the calculation on 100000+ rows(!), but I don't want to do it by hand, and may end up with more data than I can open in Excel...

    Thanks,

    David.

  • given that RowNumber is an interger, why couldn't you simply do it this way:

    select RowNumber/350 as ID, avg(close) as average, max(close) as maximum

    from your_table

    where RowNumber between 350 and 100000

    group by RowNumber/350

    order by RowNumber/350

    PS if you need to use joins use the proper JOIN syntax because it is predictable. Plus the old style join syntax is deprecated.

    The probability of survival is inversely proportional to the angle of arrival.

  • Thanks for the idea, but that appears to average a single value, which gives the value, not an average of the previous 350 values... or am I missing something?

  • actually, you are probably right... my brain was working in a different direction. However I think you could use the modulo 350 idea to generate a virtual table of RowNumbers to use to JOIN with using RowNumber between X and Y

    something like this:

    select Z.rownum, max(close), avg(close)

    from maintable A

    join (select X.J as rownum, min(X.K) as start, max(X.K) as [end] from (select convert(int,((RowNumber/350)*350)) as J, RowNumber as K from maintable) X group by X.J) Z on A.RowNumber between Z.start and Z.end

    group by Z.rownum

    The probability of survival is inversely proportional to the angle of arrival.

  • Again, not sure that would work as I need every single row to have a different index...

    row 351 has a average of rows 1 to 350

    row 352 an average of rows 2 to 351

    if sets of 350 rows have the same index I don't see how I can achieve this...

  • Its just the genesis of an idea... to generate a table having columns startRowNum and endRowNum based on modulo-350.

    It may be better to create a temp table and use identity() to create a unique rowid from 1 - 100000/350 with columns of start and end values (350 apart). The table Z might look like this:

    ID start end

    1 0 349

    2 350 699

    then:

    select Z.ID, avg(close), max(close)

    from maintable A

    JOIN Z on A.RowNumber between Z.start and Z.end

    group by Z.ID

    The probability of survival is inversely proportional to the angle of arrival.

  • I may have found something interesting here:

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=93911

    but yet to test...

  • mctaff (1/13/2010)


    row 351 has a average of rows 1 to 350

    row 352 an average of rows 2 to 351

    As soon as SQL Server will implement full windowing clause (including sliding windows) in their ranking functions as specified by SQL standard and for example implemented in Oracle 😉 it will be very easy 🙂

  • hmmm... doesn't help me much though!

    I've adapted the code from the link I posted.... 100,000 rows processed in 20 seconds for a moving average of 350 rows...

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply