Best way for dynamic partitioning of table sql server

  • sjimmo (9/12/2016)


    Jeff Moden (9/12/2016)


    BI_NewBie (12/3/2014)


    Hey Jeff!!

    Thanks for your reply.

    In my database, there is a transaction table whose row count is 117690142. There are index created on column but still when we fire any query it takes long time. So i was thinking of dividing data into multiple tables based of date; that's how i have come across the concept of SQL Partition.

    Regards.

    Missed this. That's a nice size table but, if your current query is having problems with that even with the index in place, partitioning isn't the answer. Either the wrong index has been chosen or the query is not able to use it properly. Both probably need to be fixed.

    That, not withstanding, partitioning that table properly can significantly reduce maintenance and backup requirements (as well as other benefits) and is worth looking into. But, as already stated by many, performance of code isn't one of the reasons to partition because partitioning doesn't improve performance.

    Both Gail and Kevin are also correct in saying that even MS/MSDN documentation on the subject is misleading or, in some case such as claims of code performance, is incorrect. And, as Kevin stated, there are a ton of caveats with partitioning. My thought on partitioning is "If you think or thought partitioning was easy, you're either getting ready to do it wrong or already have done it wrong". 😉

    I will agree with all that partitioning a table isn't going to improve performance of a select. I wouldn't even consider a partition on something as small as 117M rows normally, without testing it to verify that it is going to give me a good bang for my buck.

    It is nice to throw out useful information, but I don't think it should be a global rule of thumb. I actually have received performance enhancement on partitioned tables (though I am on a warehouse with billions of rows per table) but it entailed more than simply partitioning the table. I also partitioned the database into various file groups across a variety of spindles.

    Did I do this for every table? No. But there was a lot of testing and headaches to get this database where it is today.

    You asked 4 rudimentary questions about partitioning, which seemed to indicate that you didn't actually do your homework on partitioning. Now your telling us that you have partitioned (billions of rows) and have gotten performance improvements. As you said, I'm damned confused. 😉

    Shifting gears and, hopefully, you can share the information, what kind of improvements did partitioning provide for you and how did you test? Also, you said it entailed more than simply partitioning the table. Can you expound on that a bit because it sounds like those things have more of a possibility of enhancing performance than partitioning did and I'd like to learn about it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • BI_NewBie (12/3/2014)


    Hello all,

    Greetings!

    I need suggestion on Dynamic Partition of SQL Table. I have done my homework on this topic but still below questions are still stuck in my mind.

    1. What is the best practice for partitioning (on date column)

    2. The project on which i am working correctly have a case where in i get the update of my status flag after few days (Say 15 - 30) in that case if my data got into partition table how to update and how to search which partition has my data

    3. Is creating partition occupies more disk space?

    4. Is every partition would need index?

    I am damn confused over this partition thing :crazy:

    Please help in understanding the same

    Thanks & Regards.

    For question 1, "It Depends" on the type of table and its usage. If it's more of an audit table or other near WORM table, such as an invoice detail table, then partitioning on an immutable date column is a very common practice, possibly the best practice.

    For question 2, if the partitioning was done correctly, you shouldn't have to search for which partition holds your data. SQL Server will know and be able to find it. Having to define which partition you want return data from would be totally contrary to what partitioning is supposed to do for you.

    For question 3, YES... partitioning takes more disk space because of the multiple B-Trees, the need for possibly a new index to support the partitioning, the fact that the partitioning key is automatically added to every unique index in the table, and since the PK key columns are automagically added to all Non-Clustered indexes, unique or not, and the fact that an extra column has been added to the unique PK, all indexes will be larger (with fewer rows per column which is why it's a global rule of thumb that partition will not help performance), extra entries must be made in the system tables for each partition, and you have to write partition maintenance code, an all of that takes extra space. Depending on the table, the indexes, row structure, and the datatypes of the columns involved, it could be a lot of extra data.

    For question 4, To be honest, I've never considered such a thing. To me, it wouldn't make sense not to even if you could and I don't know whether you can or not. If a partition had no indexes on it, performance would suffer.

    {Edit} I may have misread question 4. If you really mean "do you have to index each partition separately?", the answer is "NO". Declaring the index on the "Partition Scheme" will correctly partition the index across all partitions on the table and will be an "aligned" index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (9/12/2016)


    You asked 4 rudimentary questions about partitioning, which seemed to indicate that you didn't actually do your homework on partitioning. Now your telling us that you have partitioned (billions of rows) and have gotten performance improvements. As you said, I'm damned confused. 😉

    The OP, two years ago, asked rudimentary questions about partitioning. Steve Jimmo (sjimmo), partitioned billion row+ tables.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/12/2016)


    Jeff Moden (9/12/2016)


    You asked 4 rudimentary questions about partitioning, which seemed to indicate that you didn't actually do your homework on partitioning. Now your telling us that you have partitioned (billions of rows) and have gotten performance improvements. As you said, I'm damned confused. 😉

    The OP, two years ago, asked rudimentary questions about partitioning. Steve Jimmo (sjimmo), partitioned billion row+ tables.

    In that case, I hope it helps someone else. And, the OP responded :-). Perhaps we can find out how he used partitioning to improve performance. :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (9/12/2016)


    And, the OP responded :-).

    Yes, back in 2014.

    Perhaps we can find out how he used partitioning to improve performance. :w00t:

    The person who said they used partitioning to improve billion row tables is NOT the OP.

    Hence your comment "You asked 4 rudimentary questions about partitioning, which seemed to indicate that you didn't actually do your homework on partitioning. Now your telling us that you have partitioned (billions of rows) and have gotten performance improvements." is not correct.

    The person who asked 4 rudimentary questions about partitioning is not the same person as the one who partitioned billions of rows and got performance improvements

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff Moden (9/12/2016)


    sjimmo (9/12/2016)


    Jeff Moden (9/12/2016)


    BI_NewBie (12/3/2014)


    Hey Jeff!!

    Thanks for your reply.

    In my database, there is a transaction table whose row count is 117690142. There are index created on column but still when we fire any query it takes long time. So i was thinking of dividing data into multiple tables based of date; that's how i have come across the concept of SQL Partition.

    Regards.

    Missed this. That's a nice size table but, if your current query is having problems with that even with the index in place, partitioning isn't the answer. Either the wrong index has been chosen or the query is not able to use it properly. Both probably need to be fixed.

    That, not withstanding, partitioning that table properly can significantly reduce maintenance and backup requirements (as well as other benefits) and is worth looking into. But, as already stated by many, performance of code isn't one of the reasons to partition because partitioning doesn't improve performance.

    Both Gail and Kevin are also correct in saying that even MS/MSDN documentation on the subject is misleading or, in some case such as claims of code performance, is incorrect. And, as Kevin stated, there are a ton of caveats with partitioning. My thought on partitioning is "If you think or thought partitioning was easy, you're either getting ready to do it wrong or already have done it wrong". 😉

    I will agree with all that partitioning a table isn't going to improve performance of a select. I wouldn't even consider a partition on something as small as 117M rows normally, without testing it to verify that it is going to give me a good bang for my buck.

    It is nice to throw out useful information, but I don't think it should be a global rule of thumb. I actually have received performance enhancement on partitioned tables (though I am on a warehouse with billions of rows per table) but it entailed more than simply partitioning the table. I also partitioned the database into various file groups across a variety of spindles.

    Did I do this for every table? No. But there was a lot of testing and headaches to get this database where it is today.

    You asked 4 rudimentary questions about partitioning, which seemed to indicate that you didn't actually do your homework on partitioning. Now your telling us that you have partitioned (billions of rows) and have gotten performance improvements. As you said, I'm damned confused. 😉

    Shifting gears and, hopefully, you can share the information, what kind of improvements did partitioning provide for you and how did you test? Also, you said it entailed more than simply partitioning the table. Can you expound on that a bit because it sounds like those things have more of a possibility of enhancing performance than partitioning did and I'd like to learn about it.

    Sorry Jeff, but I'm not the OP. I was actually, though I guess not very well, saying that I wouldn't partition a table that small though in only my case, which is all I can speak for, partitioning large tables did give more than just data management. I was able to improve performance. But it was a lot more than just partitioning the table.

    Sorry for the confusion.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • sjimmo (9/13/2016)


    was able to improve performance. But it was a lot more than just partitioning the table.

    No problem. What I'm interested in is above. At a 10,000 foot level, what else did you have to do? Thanks.

    And, my apologies... not sure why I thought you were the OP. I'm the one that was confused.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • GilaMonster (9/13/2016)


    Jeff Moden (9/12/2016)


    And, the OP responded :-).

    Yes, back in 2014.

    Perhaps we can find out how he used partitioning to improve performance. :w00t:

    The person who said they used partitioning to improve billion row tables is NOT the OP.

    Hence your comment "You asked 4 rudimentary questions about partitioning, which seemed to indicate that you didn't actually do your homework on partitioning. Now your telling us that you have partitioned (billions of rows) and have gotten performance improvements." is not correct.

    The person who asked 4 rudimentary questions about partitioning is not the same person as the one who partitioned billions of rows and got performance improvements

    Yeah... I definitely lost the bubble there. Not sure why I thought that. I'm still interested in what sjimmo did on top of partitioning to get the improvements.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (9/13/2016)


    GilaMonster (9/13/2016)


    Jeff Moden (9/12/2016)


    And, the OP responded :-).

    Yes, back in 2014.

    Perhaps we can find out how he used partitioning to improve performance. :w00t:

    The person who said they used partitioning to improve billion row tables is NOT the OP.

    Hence your comment "You asked 4 rudimentary questions about partitioning, which seemed to indicate that you didn't actually do your homework on partitioning. Now your telling us that you have partitioned (billions of rows) and have gotten performance improvements." is not correct.

    The person who asked 4 rudimentary questions about partitioning is not the same person as the one who partitioned billions of rows and got performance improvements

    Yeah... I definitely lost the bubble there. Not sure why I thought that. I'm still interested in what sjimmo did on top of partitioning to get the improvements.

    No problem. There was a LOT of testing that went on, but this DB is a DW. We store all of the raw data and do all work from there. Each of the raw data tables are on their own 2 file groups with one being data and the other for indexes. Each of the file groups is on it's own LUN on the SAN and each data segment on a RAID50 and each Index segment being on a RAID 10 LUN. (Additionally we are running fiber, 4 connection from server to SAN.)

    Each raw table is partitioned on a datetime field, with 28 days of data per partition. This is based upon the periods in our business. 1 period per partition. This has helped us greatly with loading data, aging as well as various processing of reports and cubes performed daily.

    FYI - we are a grocery food chain with 78 stores in the chain.

    In one case we partitioned a table of roughly 60M rows, based upon store number. This is our master UPC list. The raw data tables each have multiple billions of rows.

    Again, there was a lot of testing of configurations of Servers, SAN, SQL before anything was put in stone. The entire creation took almost a year, start to finish. But I have to say, it screams. The users are happy with the response time.

    Each of the disks is a mount point and there are 42 of them for this database, with each being between 500GB and 3 TB.

    Hope this helps explain.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • Thanks, Steve. I really appreciate the time you took to explain that. You folks definitely put some thought into that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (9/13/2016)


    Thanks, Steve. I really appreciate the time you took to explain that. You folks definitely put some thought into that.

    No problem Jeff. It took around 6 months to design, plan and implement. Our business relies heavily on it. We have 2 central distributors which pump literally hundreds of millions of rows per day. There were many headaches and at times dealing with MS was less than fruitful.

    If MS decides to scrap replication we are going to have a major issue. But then there are others who use it very heavily as well. These are heavy hitters like HP, Coke and Ford. These are the ones that I was told to talk to when we were designing this. So I would be surprised to see it go away.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply