Best Strategy for large qty of Log Data

  • We're designing an app to store alarm/event log messages from various legacy process control systems to an SQL Server database (SQL Server is a new tool to our company!). Messages are parsed into 8 or so fields (e.g. date/time, priority, tag, user, event_text etc) so the records are fairly small.

    One issue we have is the large volume of messages that will written to the database tables and in particular how this will affect the query performance. For example, two of the tables will get around 30k new records per day. We are considering partitioning our tables down to 1 month size (~900,000 records) and using unions in when the user queries by date.

    My question to the enlightened people here is do they think the partitioning is necessary to maintain query performance given the number of records? We have a month of test date in a table and this works fine but we have to consider the long term! Any advice/experiences is greatly appreciated.

    DJW

  • The smaller the width of your table the better it will work. I'd recommend using smalldatetime, tinyint instead of int, etc...if possible. How will you be querying? I'd guess a clustered index on the date will make sense since it won't be very unique but will probably be used a lot.

    I'd do some testing before deciding to partition. Partitioning is an ok solution, but it adds complexity/addl mgmt where maybe none is needed. Better to tweak the index plan if you can. Depending on the type of querying done you might be able to aggregate some of the data into a summary table or do a real OLAP solution.

    One final thought is again based on your query patterns. If you rarely use older data..you decide what older is...you can move that data to a second table with a job. That will keep your main table a manageable size but still let you query it all using a simple union, probably done in a few.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • Thanks for your time Andy.

    Unfortunately for us the application doesn't allow the use of smalldatetime due to the accuracy so we will have to put up with the extra bytes!

    Most user queries will be simple and based on date/time and tag so wrt indexing, a date index is a good idea for us. I've also found that there are max ~2500 values for the tag field in a given table, so this will be indexed too. There are only 16 possibilites for priority, of which only 4 are generally used, so I won't index on it.

    We might try OLAP for some stats work depending on timescales.

    Thanks again!

  • No problem. Hope you'll post a follow up with the results of your testing and your final solution.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • We have a similar situation. We are pumping about 100K records a day into our table and we found that a few basic design concepts were invaluable.

    First, we did not use the date field as a primary index key, this allowed us to use the small date time (up to the min information) in our table. We used the a uniqueidentifier as a key. We also did not declare a primary key. We incurred to much overhead. We avoided a clustered index because of the transactional nature of the table.

    Next we did some capacity planning. Using the information published in the MSDN on this topic we estimated the table\file growth and space utilization. We found that by partioning the table out to a separate Filegroup supported by a 36GB Raid 1+0 disk volume (you could use RAID 0 in your user count is low), we would have ample capacity and performance to last a year.

    We then looked closely at the data to determine what maintenance rules we should use with respect to records retention. We implemented a purge process in our maintenance package that extended the use of our configuration indefinetly.

    Finely, we looked at potential performance hits. We built covering non-clustered indexes for the static queries and banned all unrestricted queries. We found that the bottel neck (even using TOP or RowCount) was in the Tempdb and the allocated workspace memory. Unrestricted queries (especially those with order by clauses) grabbed all availible space and actually placed all pending accesses into a wait state.

    We have lots more configuration detail if you are interested.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply