Wide table performance issue

  • Do you control the load of the data? How is it done?

    I would start be analyzing your data, MAX and LEN would tell you the most characters a particular field.

    Then design a table with proper datatypes: datetime, decimal, int, varchar, char, etc.

    Then try selecting data from your table using CONVERT to convert the data to your new target datatype.

    Next try to insert your old data into your new table and fix any data problems.

    If you control the load of your data say through SSIS you can redirect and analyze records that bounce out during load and adjust your table design.

    I would think your performance problems are related to improper data types as well as the width of the fields.

  • Did you put an index on AW_POST_DATE before you changed your query? If the field is indexed and you are querying from it I'm surprised there wasn't improvement.

  • Do you have a clustered index?if you have then kindly post it...

    http://superleds.com.au

  • Few Suggestions:

    •Do you need all columns data for your application? I guess not; so use specific columns in SELECT.

    •For instance if you need only 3 (or less than 5) columns out of this table, INCLUDE them in non-clustered index.

    •Use of TOP could be business requirement but if it is not, please avoid it.

    •Try <= and >= operators instead of BETWEEN operator (I can’t justify it but sometimes it works)

  • Dev @ +91 973 913 6683 (10/20/2011)


    Few Suggestions:

    •Do you need all columns data for your application? I guess not; so use specific columns in SELECT.

    •For instance if you need only 3 (or less than 5) columns out of this table, INCLUDE them in non-clustered index.

    •Use of TOP could be business requirement but if it is not, please avoid it.

    •Try <= and >= operators instead of BETWEEN operator (I can’t justify it but sometimes it works)

    Agree with all this, however the optimizer will change a between to a => <= .

    Also be wary of using a TOP clause without an Order By as you can't be sure of the order of the results

  • Agree with all this, however the optimizer will change a between to a => <=

    This could be another topic of discussion. May not be suitable for this case but I have observed the different execution plans for BETWEEN and <= / => operators. I was in hurry for project delivery so I left my analysis half done (I guess this is the right time to finish it :-)).

  • Dev @ +91 973 913 6683 (10/20/2011)


    Agree with all this, however the optimizer will change a between to a => <=

    This could be another topic of discussion. May not be suitable for this case but I have observed the different execution plans for BETWEEN and <= / => operators. I was in hurry for project delivery so I left my analysis half done (I guess this is the right time to finish it :-)).

    Thanks for the info. I havent really looked to deeply into it but its intersting that you have had different execution plans as i was always under the impression that they were the same..

  • I've only seen one execution plan posted. It had a table scan, which means no cluster, and was predicated as follows:

    [SFDReports].[Sales].[HOWLI_LOAD].[SQL_POST_DATE]>=N'2011-10-15 00:00:00.000' AND [SFDReports].[Sales].[HOWLI_LOAD].[SQL_POST_DATE]<=N'2011-10-19 00:00:00.000'

    Based on what I'm seeing here, your dates are stored as strings? It's not doing a conversion from what I can tell.

    If you put a non-clustered index on these values, assuming the data is selective (meaning there are lots of unique values) you should see a change in the execution plan. If not, it means that the non-clustered index is completely useless (again, probably due to selectivity) and the optimizer skipped over it.

    How many rows are in the table? How many distinct values are in this column? When you see a plan, don't assume it's the same plan because you see the same operators. Always go into the properties sheets for the operators to see exactly what they're doing.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Sorry for the late reply folks. Took a long nap to overcome my stupidity while designing this table.

    Here's the deal that I deduced from much of the discussion going on.

    1) Redesign the table with appropriate data types and lengths

    2) Normalize the table to 3rd Normal Formal

    3) Put Indexes (Clustered/Non clustered) on the table.

    4) Create Partitions by Month (the month part from the AW_POST_DATE)

    5) Check for the performance improvement.

    This is my plan of action for the coming 10 hours. I will post my findings once again.

    And yes, SQL_POST_DATE is a string. But AW_POST_DATE is NOT a string. And I couldnt see any performance improvement when I used the AW_POST_DATE.

  • Yes. The SQL_POST_DATE is indeed a string. But the AW_POST_DATE is a datetime format column and using that too did not offer any performance improvement. As of now, I have approximately 1.5 million rows per month.

    But the scenario is that I have to store historical data worth 7 years.

    So we are talking about 10 million+ rows in the long run.

  • rajiv.varma (10/20/2011)


    Yes. The SQL_POST_DATE is indeed a string. But the AW_POST_DATE is a datetime format column and using that too did not offer any performance improvement. As of now, I have approximately 1.5 million rows per month.

    But the scenario is that I have to store historical data worth 7 years.

    So we are talking about 10 million+ rows in the long run.

    By my count that's more around 126 M rows.

  • Thanks for all your suggestions people. I am really glad to learn a lot of new things from you guys. 🙂

  • Oops..Bad math hits me again :D. You are right Ninja. Your count seems to be right.

  • rajiv.varma (10/20/2011)


    Oops..Bad math hits me again :D. You are right Ninja. Your count seems to be right.

    Maybe but your count can be right too.

    There's a point where you should be able to archive the older data to another table making the recenter data easier to access.

  • Thats a design issue that I need to take up with my boss. He wants all the data (Yes, the entire 7 years) ready to be reported. No archival. Now I am not sure how that would work, but I will try to educate him a bit about the Data warehousing principal.

    Generally speaking, what would be good amount of data to keep for reporting

    ???

Viewing 15 posts - 16 through 30 (of 58 total)

You must be logged in to reply to this topic. Login to reply