ALTER a HUGE TABLE

  • **Scenario 1**: I have just got a Table with Bookings for a Train for a month, Reservations of 5 mil, I need to change the schema of the table. 60 mil in 12 months time.

    If i want to make a DDL operation on the shema it takes ages.

    I put in a BCP out and drop the table re-create it and bcp in.

    but index-recreation is taking time. not as much as the other one with straright alter table did.

    **Scenario 2**: I partition the table into twelve months, is it possible to apply the ddl and would it be any faster. as it can be parallel, i have a test table and it returned to be more positive with 1 mil records of int,int,int col.

    Then i thought i can use a partiton switch to make sure i have the data in a different table, considering the lock escalations process and switching the escalations off.

    Then issue the DML on the table,

    1. Is there a easy way to copy data across paritions parallel,?

    2. Not Considering OPENROWSET as it has to go through the layer of OLEDB, it wont be as fast as native sql server. or is there any stats which are available if any one has done some stats on it?

    Scenario 2 is more appealing for me, as i can rebuild indexes parallel and also if data for Dec is not required i can still index it when i need to, i.e., i can hold it for some time.

    **Scenario 3**: Is there any thing else in the industry which is used by the community.

    Regards
    Vinay

  • What is the change you are trying to implement, are you scripting it or running it through management studio?

    Andrew

  • First, table partitioning is an "Enterprise Edition" option. If you don't have the "Enterprise Edition", it's not an option.

    If you don't have the "Enterpise Edition", you could make a "Partitioned View" with the underlying correctly date-named tables, a bit of code to automatically create and move data from the "current month" table to a date-named table, and the bit of code to automatically update the partitioned view to include the new table.

    There are some uncomplicated but strict rules for creating good partioned views in Books Online.

    The advantage here is that once a date-named table has been created and provided that no data is added or modified, the indexes on those tables shouldn't need any maintenance at all. Only the "current month" table will need that.

    The disadvantage is, of course, that you'll need to write some code to create the tables and populate them instead of just "sliding" a window. If you write the code correctly, though, it'll become a monthly "no brainer".

    Also, be aware that even correctly setup partitioned views may not operate as fast as advertized. Like everything else, "It Depends".

    Of course, if you have the "Enterprise Edition", partitioned tables would definitely be the way to go.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • The issue is not what we are thinking!.

    Well to Explain it

    --------------------------------------------------------------------------------------

    Partition1 | Partition2 | Partition3 | Partition4 | Partition5 |

    42 MilRec | 42 Mil Records | 42 Mil Records | 42 Mil Records | 42 Mil Records |

    ---------------------------------------------------------------------------------------

    Now i want to alter this table to add a new column with not null defaulting some value.

    Estimated time would be taken is around about 7 hours to alter this table to add a not null for 7 years worth of history.

    Now What is the fastest way available for us to make sure alter happens with speed.

    Thats the reason why i wanted to swtith this to a mirror table and alter the blank table and copy data back to the primary table , as i cannot switch back the partition into the main table as the structure is not the same any more.

    What is the quick way to alter the table with not null columns.

    Currently:

    We do is

    Remove all the Foreign keys.

    Rename the table to a old table.

    Create a new table

    Insert all data back to Original Reservations Table. in a loop.

    Create all Foreign keys.

    Truncate the old table

    Drop the old table.

    To limit the Transaction log overflow.

    Need help which does some thing like a bcp without the over head of casting and converting.

    Any ideas on this?

    Regards
    Vinay

  • Why does the schema of such a huge table have to change? Please explain the logic behind this as it seems highly unlikely to be sensible. What are you doing that requires the addition of an extra column, and why does it have to be NOT NULL with a default value?

    It is generally a bad idea to add or remove columns from a table at the best of times. Doing so on a huge partitioned table, while preventing SQL Server from performing it as a meta-data operation seems close to masochism.

    If you can describe a sensible and reasonable business or technical justification for the requirement, I'll give it some thought.

    Paul

  • Its is not very uncommen to have a business change in the current business conditions and we get new upstream and downstream systems and Our OLTP needs to cope with the data what we are processing what type of customers we are taking in prioritisation mechanisam many other things which need as per the business change.

    This makes up the story aquisition of different company makes it difficult to incorporate their schema and other things.

    So ultimately the schema has to change to accomodate the in and out of data.

    and we need 7yrs worth of history to stay online and available.

    I hope i put it in the right way. Trust me there are scenario's where this is realy necessary.

    And Not null is the only way as the column must have data. What do you do in such a situation, I give up. as i dont have anything in my hand.

    but if somebody is having some clean idea then that would be interesting.

    Regards
    Vinay

  • Vinay,

    I've not read all of the posts on this thread so let me ask... can the system be down while you do this or not? If so, for how long?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • The reason I asked for details was because the best solution often depends on the details.

    All you've done really is ask me to trust you, and that there really is no alternative.

    I have no reason to disbelieve you, of course, it's just that without knowing the details, I cannot be sure I would agree with your assessment.

    I have a feeling Jeff is about to suggest bulk copying the data out and using a BCP format file on the way in.

    If so, this is indeed an option.

    Another option is to add the column as a metadata change (adding it as NULLable). You could then go through and update small batches from NULL to the default value over time.

    There are many, many methods. The correct one depends on the details, which we don't have.

    Paul

  • I would like to give some ideas based on my experience ( though this approach may change depends on your business requirement)

    1. Rename the source table ( history data table) as OLD and keep away from the usage ( Make sure no process or application can use thios table)

    2. Create NEW table with additional column and partition based on date range or month value

    3. Add indexes on the NEW table

    4. Create NEW processing table without any partition but similar to the NEW structure. No index at this moment

    5. BULD load ( use TABLOCK , etc) to load data from the OLD ?(history table) required for single partition ( ie for the month data)

    6. Create indexes (similar to the one required for NEW table). Complete other tasks required for parttion switch

    7. Apply PS from NEW processing to Target

    8. Validate the data count ..on Target

    9. Drop all indexes

    Repeat the steps from step 5 to till 9 for all the months. That's all.

  • Jeff Moden (2/21/2010)


    Vinay,

    I've not read all of the posts on this thread so let me ask... can the system be down while you do this or not? If so, for how long?

    The system can be down for an hour or two as we do the tweeks in the weekend jobs,

    The window could be also increased to 3 hours max which we have time to make sure that the patch applied correctly and a small tests which run on top of it.

    Thank you in advance.

    Regards
    Vinay

  • Paul White (2/21/2010)


    The reason I asked for details was because the best solution often depends on the details.

    All you've done really is ask me to trust you, and that there really is no alternative.

    I have no reason to disbelieve you, of course, it's just that without knowing the details, I cannot be sure I would agree with your assessment.

    I have a feeling Jeff is about to suggest bulk copying the data out and using a BCP format file on the way in.

    If so, this is indeed an option.

    Trust me it took long time to mockup the data for 7 years and bring the database to a grinding halt. then next i made sure i had one partition per year,

    Did a bcp on parallel into the database 7 clients hitting diffenrt partitions to bcp in data into the table that was slower than the

    single threaded bcp into the partitioned table. may be i was maxing on memory,

    Another option is to add the column as a metadata change (adding it as NULLable). You could then go through and update small batches from NULL to the default value over time.

    There are many, many methods. The correct one depends on the details, which we don't have.

    Paul

    This is an other thing which i thougt should be straright forward doing this not null and then updating it to some default value, which business requires. Then agian i am running out of 3 hour data window allowed, it makes me feel like a looser.

    I thought i will finally give up on this.

    Regards
    Vinay

  • bhushanvinay (2/22/2010)


    This is an other thing which i thougt should be straright forward doing this not null and then updating it to some default value, which business requires. Then agian i am running out of 3 hour data window allowed, it makes me feel like a loser. I thought i will finally give up on this.

    Before you give up, have you considered using a view?

    You would need to test carefully, but I have use this approach successfully before.

    Rather than get too much into the detail, here is a full demonstration script...

    USE tempdb;

    GO

    IF OBJECT_ID(N'dbo.HugeTable', N'V') IS NOT NULL DROP VIEW dbo.HugeTable;

    IF OBJECT_ID(N'dbo.HugeTable', N'U') IS NOT NULL DROP TABLE dbo.HugeTable;

    IF OBJECT_ID(N'dbo.HiddenHugeTable', N'U') IS NOT NULL DROP TABLE dbo.HiddenHugeTable;

    GO

    CREATE TABLE dbo.HugeTable

    (

    row_id INTEGER IDENTITY PRIMARY KEY,

    data BIGINT NOT NULL,

    );

    GO

    -- Row number generator

    WITH L0 AS (SELECT 1 AS c UNION ALL SELECT 1),

    L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

    L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

    L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

    L4 AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

    L5 AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

    Num AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L5)

    -- Add 250K rows to the test table (takes around one second)

    INSERT dbo.HugeTable WITH (TABLOCK)

    (data)

    SELECT TOP (250000)

    N

    FROM Num

    ORDER BY

    N ASC;

    GO

    -- Add a new column, as a purely metadata operation (instant)

    ALTER TABLE dbo.HugeTable ADD new_column INTEGER NULL;

    GO

    -- Rename the table (instant if it succeeds)

    EXECUTE sp_rename N'dbo.HugeTable', N'HiddenHugeTable', 'OBJECT';

    GO

    -- Optional: add an index to allow us to find NULL values quickly later on

    -- Takes around 1 second to execute

    -- Index could be created later, possibly ONLINE

    CREATE NONCLUSTERED INDEX nc1 ON dbo.HiddenHugeTable (new_column)

    WITH (FILLFACTOR = 50, PAD_INDEX = ON, SORT_IN_TEMPDB = ON, ONLINE = OFF, MAXDOP = 4)

    ON [PRIMARY];

    GO

    -- Create a view over the renamed table

    -- The view returns a default value if it finds a NULL in the new column

    -- To clients, it seems as if the new column exists, and has the required value

    CREATE VIEW dbo.HugeTable

    WITH SCHEMABINDING

    AS

    SELECT HHT.row_id,

    HHT.data,

    new_column = ISNULL(HHT.new_column, 12345) -- 12345 is the DEFAULT value

    FROM dbo.HiddenHugeTable HHT;

    GO

    -- Create INSTEAD OF triggers on the view to allow changes to the data

    -- INSERTs, UPDATEs, and DELETEs are simply performed against the

    -- renamed real table. The triggers are written separately for clarity.

    -- INSERT

    CREATE TRIGGER dbo.[trg dbo.HugeTable IOI]

    ON dbo.HugeTable

    WITH EXECUTE AS CALLER

    INSTEAD OF INSERT

    AS

    BEGIN

    SET NOCOUNT ON;

    SET ROWCOUNT 0;

    IF NOT EXISTS (SELECT * FROM inserted) RETURN;

    INSERT dbo.HiddenHugeTable

    (data, new_column)

    SELECT data, new_column

    FROM inserted;

    END;

    GO

    -- DELETE

    CREATE TRIGGER dbo.[trg dbo.HugeTable IOD]

    ON dbo.HugeTable

    WITH EXECUTE AS CALLER

    INSTEAD OF DELETE

    AS

    BEGIN

    SET NOCOUNT ON;

    SET ROWCOUNT 0;

    IF NOT EXISTS (SELECT * FROM deleted) RETURN;

    DELETE HHT

    FROM deleted DEL

    JOIN dbo.HiddenHugeTable HHT

    ON HHT.row_id = DEL.row_id;

    END;

    GO

    -- UPDATE

    CREATE TRIGGER dbo.[trg dbo.HugeTable IOU]

    ON dbo.HugeTable

    WITH EXECUTE AS CALLER

    INSTEAD OF UPDATE

    AS

    BEGIN

    SET NOCOUNT ON;

    SET ROWCOUNT 0;

    IF NOT EXISTS (SELECT * FROM inserted) RETURN;

    UPDATE HHT

    SET data = INS.data,

    new_column = INS.new_column

    FROM inserted INS

    JOIN dbo.HiddenHugeTable HHT

    ON HHT.row_id = INS.row_id

    END;

    GO

    -- Tests

    -- 1. Show the first ten rows from the view - the new column is there, with the default value

    SELECT TOP (10) * FROM dbo.HugeTable ORDER BY row_id ASC;

    -- 2. Show the first ten rows from the hidden table - the new column is there, but full of NULLs

    SELECT TOP (10) * FROM dbo.HiddenHugeTable ORDER BY row_id ASC;

    -- 3. Add two new rows, through the view

    INSERT dbo.HugeTable (data, new_column)

    SELECT -1, 10 UNION ALL

    SELECT -2, 20;

    -- 4. Show that the new rows exist

    SELECT * FROM dbo.HugeTable WHERE row_id > 250000;

    -- 5. Update the new rows

    UPDATE dbo.HugeTable

    SET data = data - 100,

    new_column = new_column + 50

    WHERE row_id > 250000;

    -- 6. Show the updated rows

    SELECT * FROM dbo.HugeTable WHERE row_id > 250000;

    -- 7. Delete the new rows

    DELETE dbo.HugeTable WHERE row_id > 250000;

    -- 8. Show that the rows have gone

    SELECT * FROM dbo.HugeTable WHERE row_id > 250000;

    -- 9. An example query to update a section of the underlying NULL values

    UPDATE TOP (500)

    dbo.HiddenHugeTable

    SET new_column = 12345

    WHERE new_column IS NULL;

    -- 10. Once all the NULL values have been updated, we don't need the view any more

    BEGIN TRANSACTION;

    -- Drop the view

    IF OBJECT_ID(N'dbo.HugeTable', N'V') IS NOT NULL DROP VIEW dbo.HugeTable;

    -- Rename the table

    EXECUTE sp_rename N'dbo.HiddenHugeTable', N'HugeTable', 'OBJECT';

    -- Drop the index we used to help the update (note the table name has changed)

    DROP INDEX nc1 ON dbo.HugeTable;

    COMMIT TRANSACTION;

    -- 11. Tidy up

    IF OBJECT_ID(N'dbo.HugeTable', N'V') IS NOT NULL DROP VIEW dbo.HugeTable;

    IF OBJECT_ID(N'dbo.HugeTable', N'U') IS NOT NULL DROP TABLE dbo.HugeTable;

    IF OBJECT_ID(N'dbo.HiddenHugeTable', N'U') IS NOT NULL DROP TABLE dbo.HiddenHugeTable;

    -- End of demo script

    Paul

  • Hi Paul,

    Seems idd like a verry clever idea.

    Gonna save this post for later use, could come in handy.

    Ps: didn't you miss deleting the triggers also once the view isn't needed anymore, cause tehy will reference a non existing object (HiddenHugeTable)

    Wkr,

    Eddy

  • eddy-644184 (2/22/2010)


    Seems like a very clever idea. Im going to save this post for later use, could come in handy.

    PS: Didn't you miss deleting the triggers also once the view isn't needed any more? They will reference a non existing object (HiddenHugeTable)

    Hey Eddy,

    Thanks. It can be made to work well, but thorough testing for compatibility with the old structure is absolutely essential.

    When the view is deleted, the triggers are automatically deleted by SQL Server. There's no need to drop them explicitly.

    Paul

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply