Question/Advice needed Sliding Window partitioning range right,what the effecient way to implement/are staging tables needed

  • Okay yesterday I was in a bit of a panic and made the post Partition Sliding window causing loads of... in a haste without thinking it through, so that topic can be ignored, I managed to rest calm down and think.

    Here's the situation; we have a database for our warehouse where they test machines before shipping them off to the clients,these machines are either newly build or came back for repairs. Each machine gets it own table which is the warehouse name followed by sequence number that's unique for that machine. The database is starting to get sluggish, we want to implement a sliding window partition,containing 2 partitions Archive & Current, with current containing a max of the past 45 days.

    Issues like I said are new machines,so on an almost 24/7 basis new tables can be added or an old table that hasn't been touched in years gets new data.

    On a weekly basis there is a job that checks the tables for compression levels, every 4 hours a backup of the transaction log is taken.

    There is a 0.00 margin for downtime :crying:

    So this is a range right partion,I've done the following

    CREATE PARTITION FUNCTION [WAREHOUSE_P_F](datetime) AS RANGE RIGHT FOR VALUES (N'2015-09-08T00:00:00.000')

    CREATE PARTITION SCHEME [WAREHOUSE_S_F] AS PARTITION [WAREHOUSE_P_F] TO ([FG_DATA_001], [FG_DATA_CURRENT])

    Now I've been moving the tables over starting with the smallest and also a few big ones,checking locks,transaction logs and filegrowth of FG_DATA_CURRENT.

    Now comes my problem with the implementation of the sliding window,the procedure I got takes to long blocks to many progress's and gets deadlocked itself after a while.

    This was not an issue during development for the setup and my request back then for a proper realistic test environment fell on deaf ears.

    I've seen some potential solutions but they've all used staging tables (and they were also dealing with a single table), so anyone got a idea what I could do.

  • Resender (10/30/2015)


    Okay yesterday I was in a bit of a panic and made the post Partition Sliding window causing loads of... in a haste without thinking it through, so that topic can be ignored, I managed to rest calm down and think.

    Here's the situation; we have a database for our warehouse where they test machines before shipping them off to the clients,these machines are either newly build or came back for repairs. Each machine gets it own table which is the warehouse name followed by sequence number that's unique for that machine. The database is starting to get sluggish, we want to implement a sliding window partition,containing 2 partitions Archive & Current, with current containing a max of the past 45 days.

    Issues like I said are new machines,so on an almost 24/7 basis new tables can be added or an old table that hasn't been touched in years gets new data.

    On a weekly basis there is a job that checks the tables for compression levels, every 4 hours a backup of the transaction log is taken.

    There is a 0.00 margin for downtime :crying:

    So this is a range right partion,I've done the following

    CREATE PARTITION FUNCTION [WAREHOUSE_P_F](datetime) AS RANGE RIGHT FOR VALUES (N'2015-09-08T00:00:00.000')

    CREATE PARTITION SCHEME [WAREHOUSE_S_F] AS PARTITION [WAREHOUSE_P_F] TO ([FG_DATA_001], [FG_DATA_CURRENT])

    Now I've been moving the tables over starting with the smallest and also a few big ones,checking locks,transaction logs and filegrowth of FG_DATA_CURRENT.

    Now comes my problem with the implementation of the sliding window,the procedure I got takes to long blocks to many progress's and gets deadlocked itself after a while.

    This was not an issue during development for the setup and my request back then for a proper realistic test environment fell on deaf ears.

    I've seen some potential solutions but they've all used staging tables (and they were also dealing with a single table), so anyone got a idea what I could do.

    Before proceeding any further,

    how many rows are in the table?

    What size is it?

    What gains do you expect from partitioning?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Before proceeding any further,

    ChrisM@Work (10/30/2015)how many rows are in the table?

    The largest table contains above 1.5 million records,the smallest about 16, and their hundreds on hundreds that are currently empty.

    ChrisM@Work (10/30/2015)What size is it?

    I currently don't have the figures or the time, but based on our estimates we would need a minimum of 54GB of disk for all the data in the past 45 days

    and it's been over 1600 days since this database came into use the first time with this system.

    ChrisM@Work (10/30/2015)What gains do you expect from partitioning?

    Improve query response times,for 1 machine will need all the historical data but the reports only need the data from the past 6 weeks.

  • Resender (11/3/2015)


    Before proceeding any further,

    ChrisM@Work (10/30/2015)how many rows are in the table?

    The largest table contains above 1.5 million records,the smallest about 16, and their hundreds on hundreds that are currently empty.

    ChrisM@Work (10/30/2015)What size is it?

    I currently don't have the figures or the time, but based on our estimates we would need a minimum of 54GB of disk for all the data in the past 45 days

    and it's been over 1600 days since this database came into use the first time with this system.

    ChrisM@Work (10/30/2015)What gains do you expect from partitioning?

    Improve query response times,for 1 machine will need all the historical data but the reports only need the data from the past 6 weeks.

    Partitioning is unlikely to improve your query response times, in fact it's quite likely to make them worse. There's a good proof of this here. If performance is your sole reason for considering partitioning, then you'd be well advised not to pursue it further, and focus on other avenues for query tuning.

    1.5 million rows is very modest.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • OK, not so good news then,the total size of the database currently is 1.9TB

  • Each machine gets it own table which is the warehouse name followed by sequence number that's unique for that machine.

    How many of these tables are there in the db?

    Are they structurally identical?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Total number of machine tables (3/11/2015 13:44 CET) 123851

    Those tables are the same

    [MachineRegisterId] [int] NOT NULL,

    [UtcTime] [datetime] NOT NULL,

    [Value] [float] NOT NULL

    With primary key on the combination of MachineRegisterId & UtcTime

  • Resender (11/3/2015)


    Total number of machine tables (3/11/2015 13:44 CET) 123851

    Those tables are the same

    [MachineRegisterId] [int] NOT NULL,

    [UtcTime] [datetime] NOT NULL,

    [Value] [float] NOT NULL

    With primary key on the combination of MachineRegisterId & UtcTime

    Your plan is to merge these 123,851 tables into one partitioned table (with two additional columns derived from the table names), with a partition on [UtcTime]?

    What is the total rowcount?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (11/3/2015)


    Resender (11/3/2015)


    Total number of machine tables (3/11/2015 13:44 CET) 123851

    Those tables are the same

    [MachineRegisterId] [int] NOT NULL,

    [UtcTime] [datetime] NOT NULL,

    [Value] [float] NOT NULL

    With primary key on the combination of MachineRegisterId & UtcTime

    Your plan is to merge these 123,851 tables into one partitioned table (with two additional columns derived from the table names), with a partition on [UtcTime]?

    What is the total rowcount?

    oh not at all the idea is to partition all 123,851 tables over 2 partitions so that each table has an archive/active section,my appologies if my explanation was bad.

  • Resender (11/3/2015)


    ChrisM@Work (11/3/2015)


    Resender (11/3/2015)


    Total number of machine tables (3/11/2015 13:44 CET) 123851

    Those tables are the same

    [MachineRegisterId] [int] NOT NULL,

    [UtcTime] [datetime] NOT NULL,

    [Value] [float] NOT NULL

    With primary key on the combination of MachineRegisterId & UtcTime

    Your plan is to merge these 123,851 tables into one partitioned table (with two additional columns derived from the table names), with a partition on [UtcTime]?

    What is the total rowcount?

    oh not at all the idea is to partition all 123,851 tables over 2 partitions so that each table has an archive/active section,my appologies if my explanation was bad.

    what's the total rowcount?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (11/3/2015)


    Resender (11/3/2015)


    ChrisM@Work (11/3/2015)


    Resender (11/3/2015)


    Total number of machine tables (3/11/2015 13:44 CET) 123851

    Those tables are the same

    [MachineRegisterId] [int] NOT NULL,

    [UtcTime] [datetime] NOT NULL,

    [Value] [float] NOT NULL

    With primary key on the combination of MachineRegisterId & UtcTime

    Your plan is to merge these 123,851 tables into one partitioned table (with two additional columns derived from the table names), with a partition on [UtcTime]?

    What is the total rowcount?

    oh not at all the idea is to partition all 123,851 tables over 2 partitions so that each table has an archive/active section,my appologies if my explanation was bad.

    what's the total rowcount?

    and how is this lot typically interrogated?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • In the meantime 123883 tables (yes 30 new tables got added) the total recordcount is

    89,372,968,620

    ChrisM@Work (11/3/2015)and how is this lot typically interrogated?

    I wish I knew but I'm frequently being told that the reports are beyond my paygrade

  • Resender (11/3/2015)


    In the meantime 123883 tables (yes 30 new tables got added) the total recordcount is

    89,372,968,620

    ChrisM@Work (11/3/2015)and how is this lot typically interrogated?

    I wish I knew but I'm frequently being told that the reports are beyond my paygrade

    Until you posted up that number, I was thinking that the best way to handle this lot would be a single table, but the largest single table I've worked with was a little over INT rows, about 2,800,000,000 - and it took some effort to get it to play nicely. Here, our largest table is a little under 1,000,000,000 rows and we get results back in milliseconds. 89,372,968,620 is out of my experience / comfort zone but I'd still lean towards one or two tables, one to hold individual devices, the other to hold activity. Let's see what alternatives folks come up with.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (11/3/2015)


    Resender (11/3/2015)


    In the meantime 123883 tables (yes 30 new tables got added) the total recordcount is

    89,372,968,620

    ChrisM@Work (11/3/2015)and how is this lot typically interrogated?

    I wish I knew but I'm frequently being told that the reports are beyond my paygrade

    Until you posted up that number, I was thinking that the best way to handle this lot would be a single table, but the largest single table I've worked with was a little over INT rows, about 2,800,000,000 - and it took some effort to get it to play nicely. Here, our largest table is a little under 1,000,000,000 rows and we get results back in milliseconds. 89,372,968,620 is out of my experience / comfort zone but I'd still lean towards one or two tables, one to hold individual devices, the other to hold activity. Let's see what alternatives folks come up with.

    Let's up that unconformtability a bit,from the figures I get a minute of unexpected downtime and the inputbuffers for the data not holding would cost us 4160 euro.

    Or to be better put this is an application that resulted from the merge of several dozens instead being developed from the ground up, and we're reached a point where we can neither maintain or keep feedback from it properly.

  • Resender (11/3/2015)


    OK, not so good news then,the total size of the database currently is 1.9TB

    Hold the phone a minute... While it's true that partitioning is NOT a panacea of performance for queries in general, it's fantastic at being able to almost instantly drop (using SWITCH) large amounts of legacy data.

    With that in mind, let's peel just one potato at a time to get to the bottom of all this. For starters, what are the 3 largest tables in the database that you'd like to apply the 45 day drop-rows rule to and how are they used? Are they OLTP or more like an audit/invoice detail table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply