Wide table performance issue

  • I know these kind of questions are going to pop up. But thats for future.

    Right now, the immediate concern is redesign of tables to at least showcase the capability of SQL Server and MicroStrategy combined.

  • How exactly are you going to be presenting your data? For example, if you are looking year over year then you may want to partition on year, not month. However, if you are going to be doing many analyses, it seems you may be able to use this as a Fact Table and build some dimension tables (dimTime with yearly, quarterly, monthly, weekly, daily) for SSAS. Then just cube it.

    Jared

    Jared
    CE - Microsoft

  • Jared,

    I already built a cube to showcase how end users could use the data. But I guess my users dont have a clear vision of as to what they would like to see i.e by month, by year, by quarter etc. I guess that would be evolving design as time goes by.

  • Yes, well you can't properly analyze data efficiently in multiple ways with simply a table. Mind you, the proper indexes and query optimization will speed it up... but it is not efficient when they start asking multiple questions. Tell me... How many rows are currently in this table of yours? sp_spaceused 'tableName'

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • I am assuming that the goal of this is for analysis only. It is not to be used as a transactional database? If so, don't spend so much time on optimizing this as you would for a OLTP database, you will need to optimize it for OLAP or Data Mining.

    Jared

    Jared
    CE - Microsoft

  • The table as of now adds approximately 100,000 rows everyday. This is not a transactional system at all. In the long run, the system is intended to replace the existing excel based reporting (which does not pull data from SQL). SQL Database is a completely new system being introduced into the company.

  • I'm sorry, I don't understand. How many rows are currently in the table? How do these 100,000 rows get inserted into SQL currently (I assume by some of the posts from Excel import)? Is this done daily, or 200000 every 2 days or something like that? Where is the data getting into Excel from (mySQL? Oracle? People?)?

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • A scheduled SQL Agent job pulls across this information from another system (UniVerse) through a ODBC connection. It gets populated everyday except for Sunday (thats when the business is closed). Here's the result of SP_SPACEUSED on the table.

    name rows reserved data index_size unused

    HOWLI_LOAD2719305 6420856 KB 4868992 KB 221064 KB 1330800 KB

  • So this is only 27 days worth of data? Is that correct? When you say that it grows by 100000 each day, does that mean the main system takes on 100000 rows each day or that the data from years ago is migrating at that rate?

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • The transactional system has 100,000 transactions everyday and I capture those..But I am going to roll backwards to atleast 2 years to be able to report on them. Consider the example of a store. Everyday, the store makes n number of transactions. At close, all the records are summed up or all the transactions are moved over to the SQL System.

    Thats the scenario.

  • Ok. Assuming this is a table for analysis and nothing else we don't split the table. What is zid? Is this a unique id from the main system? If so, I assume there are no duplicates and that it is incremented by each order? If so, get rid of your identity column, make this column your primary key and clustered index. Then non-clustered on your date that you are filtering on. Start there.

    Jared

    Jared
    CE - Microsoft

  • rajiv.varma (10/20/2011)


    Sorry for the late reply folks. Took a long nap to overcome my stupidity while designing this table.

    Here's the deal that I deduced from much of the discussion going on.

    1) Redesign the table with appropriate data types and lengths

    2) Normalize the table to 3rd Normal Formal

    3) Put Indexes (Clustered/Non clustered) on the table.

    4) Create Partitions by Month (the month part from the AW_POST_DATE)

    5) Check for the performance improvement.

    This is my plan of action for the coming 10 hours. I will post my findings once again.

    And yes, SQL_POST_DATE is a string. But AW_POST_DATE is NOT a string. And I couldnt see any performance improvement when I used the AW_POST_DATE.

    Can you repost your schema when this is complete? There's a few other things I'd review here as well but might as well work from the latest version. Amongst other things, if SQL_CATCH_WGTS really needs to be nText, you probably want to get that into its own filegroup (TEXTIMAGE_ON property).

    After that, you'll want to analyze the queries that would use this directly. See if you've got an 80/20 scenario (80% of the time you use 20% of the columns, to paraphrase). If so, a 1-1 split on this table to remove the excess baggage from the majority of queries would be assistive.

    nVARCHAR is for unicode, VARCHAR is for the standard ansi char set. Do you actually NEED unicode? You can always convert it back to unicode on outbound if you're re-dumping to Excel somewhere but if you're going to make this system of record you might as well make it accurate.

    There are oddities in here... like numeric(4,0). Erm, INT is a bit easier. šŸ™‚

    Part of your difficulty with Datetime vs. character was the lack of indexing, thus the no difference in performance. Looking forward to your testing after you've included some of that.

    The less you have that's Nullable, the less work you'll have to do on the data retrieval end. Only leave columns Nullable that are supposed to be allowed that value. NOT NULL vs. NULL for a single column isn't really a HUGE deal... but this is not one column.

    If you can also provide a few standard queries you expect to use against this table, we can probably help you more once you've reposted the schema.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Kraig,

    I am working on redoing the entire table. But its heck lot of work on SSMS, SSIS package and stuff. So will report back what I find/do with this.

  • Dev @ +91 973 913 6683 (10/20/2011)


    That's what Views utilizing UNION are good for

    Is it not a CLASSIC implementation in SQL Server (I believe SS2K), when we didn't have partitioning options?

    What about maintenance of these tables? Logically adding one column in design translates to adding a column to ā€˜Nā€™ tables Physically.

    If I'm reading the above correctly...

    There actually were partitioning options in SQL Server 2K called "Partitioned Views" and, when correctly implemented, worked pretty darned good. And, guess what... they use UNION ALL or UNION depending on the data. šŸ˜‰

    So far as adding columns to "N" tables goes, such a design change doesn't happen very often (usually only once or twice during the lifetime of the table). When it does happen, a short bit of work with a little dynamic SQL knocks that problem out in very short order.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 14 posts - 46 through 58 (of 58 total)

You must be logged in to reply to this topic. Login to reply