Inserting millions of rows into a partitioned view

  • I've a table with +100 million rows that I want to partition, and as I'm restricted to standard edition 2008 R2 I'm going to use a partitioned view. The four tables (currently) and the view have been setup and I'm now thinking about migrating the data. Bulk insert and bcp can't be used with partitioned views so I'm left with a standard insert into, or is there an alternative?

    If I have to use a single insert into statement, can I batch it in any way using a transaction to do, say 100,000 rows, then wait before doing the next batch?

    The tables are partitioned on a smalldatetime column that is part of the key (one table per year).

  • I might be missing something - but why would you want to migrate data?

    I'm assuming you are talking about an approach similar to this -> http://www.b4pjs.co.uk/archive/2011/04/using-dynamically-created-partitioned-views/

    Edit: Looks like my brain finally woke up - you want to migrate data from your existing table to the newly setup tables that will be used in the view.

    I remember reading a forum post a few days back here about partition switching in standard edition - let me see if I can find that.

  • Yes, that's what I'm after. Tried batching:

    declare @BatchSize int

    select @BatchSize = 200000

    while 1 > 0

    begin

    begin tran

    insert into Destination

    (

    ...

    )

    select top(@BatchSize)

    ...

    from

    Source S

    where

    not exists --for batching

    (

    select *

    from

    Destination D

    where

    D.Key = S.Key

    )

    if @@rowcount > 0

    begin

    commit tran

    waitfor delay '00:00:01'

    end

    else

    begin

    commit tran

    break

    end

    end

    But...it doesn't like the sub-query as it references the partitioned destination view. BOL says:

    "INSERT, UPDATE, and DELETE actions against a partitioned view are not allowed if there is a self-join with the same view or with any of the member tables in the statement."

    This isn't a self-join but seems to apply to a sub-query as well.

    I hope that inner/left joins between the view and another table are ok for insert/update/delete, otherwise it will be pretty useless. Something like:

    update Destination

    set ...

    from Destination D

    inner join TableA A on D.Key = A.Key

    Is this going to work where Destination is a partitioned view - bit worried now?

  • The script below works if you have an identity column in the table - does the approach below help in your case?

    /*

    IF OBJECT_ID('dbo.Testvw') IS NOT NULL

    DROP VIEW dbo.Testvw;

    IF OBJECT_ID('dbo.TestTab_Main') IS NOT NULL

    DROP TABLE dbo.TestTab_Main;

    IF OBJECT_ID('dbo.TestTab_2008') IS NOT NULL

    DROP TABLE dbo.TestTab_2008;

    IF OBJECT_ID('dbo.TestTab_2009') IS NOT NULL

    DROP TABLE dbo.TestTab_2009;

    IF OBJECT_ID('dbo.TestTab_2010') IS NOT NULL

    DROP TABLE dbo.TestTab_2010;

    */

    GO

    CREATE TABLE dbo.TestTab_Main(RowId int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,

    DateColumn smalldatetime NOT NULL);

    CREATE TABLE dbo.TestTab_2008(RowId int NOT NULL PRIMARY KEY CLUSTERED,

    DateColumn smalldatetime NOT NULL);

    CREATE TABLE dbo.TestTab_2009(RowId int NOT NULL PRIMARY KEY CLUSTERED,

    DateColumn smalldatetime NOT NULL);

    CREATE TABLE dbo.TestTab_2010(RowId int NOT NULL PRIMARY KEY CLUSTERED,

    DateColumn smalldatetime NOT NULL);

    GO

    CREATE VIEW dbo.Testvw

    WITH SCHEMABINDING

    AS

    SELECT RowId,DateColumn FROM dbo.TestTab_2008

    UNION ALL

    SELECT RowId,DateColumn FROM dbo.TestTab_2009

    UNION ALL

    SELECT RowId,DateColumn FROM dbo.TestTab_2010

    GO

    INSERT dbo.TestTab_Main(DateColumn)

    SELECT DATEADD(hour,T1.number,'20080101') FROM

    (SELECT number FROM master.dbo.spt_values WHERE type = 'P') T1

    CROSS JOIN

    (SELECT TOP 10 number FROM master.dbo.spt_values) T2

    UNION ALL

    SELECT DATEADD(hour,T1.number,'20090101') FROM

    (SELECT number FROM master.dbo.spt_values WHERE type = 'P') T1

    CROSS JOIN

    (SELECT TOP 10 number FROM master.dbo.spt_values) T2

    UNION ALL

    SELECT DATEADD(hour,T1.number,'20100101') FROM

    (SELECT number FROM master.dbo.spt_values WHERE type = 'P') T1

    CROSS JOIN

    (SELECT TOP 10 number FROM master.dbo.spt_values) T2;

    GO

    DECLARE @MaxCtr int, @Batch int=100, @RowCtr int;

    WHILE(1=1)

    BEGIN

    SELECT @MaxCtr = ISNULL(MAX(RowId),0) FROM dbo.TestTab_2008;

    INSERT dbo.TestTab_2008(RowId,DateColumn)

    SELECT TOP (@Batch) RowId,DateColumn FROM dbo.TestTab_Main

    WHERE DATEPART(YEAR,DateColumn)=2008

    AND RowId > @MaxCtr

    ORDER BY RowId ASC;

    SET @RowCtr = @@ROWCOUNT;

    IF @RowCtr = 0

    BREAK;

    END

    WHILE(1=1)

    BEGIN

    SELECT @MaxCtr = ISNULL(MAX(RowId),0) FROM dbo.TestTab_2009;

    INSERT dbo.TestTab_2009(RowId,DateColumn)

    SELECT TOP (@Batch) RowId,DateColumn FROM dbo.TestTab_Main

    WHERE DATEPART(YEAR,DateColumn)=2009

    AND RowId > @MaxCtr

    ORDER BY RowId ASC;

    SET @RowCtr = @@ROWCOUNT;

    IF @RowCtr = 0

    BREAK;

    END

    WHILE(1=1)

    BEGIN

    SELECT @MaxCtr = ISNULL(MAX(RowId),0) FROM dbo.TestTab_2010;

    INSERT dbo.TestTab_2010(RowId,DateColumn)

    SELECT TOP (@Batch) RowId,DateColumn FROM dbo.TestTab_Main

    WHERE DATEPART(YEAR,DateColumn)=2010

    AND RowId > @MaxCtr

    ORDER BY RowId ASC;

    SET @RowCtr = @@ROWCOUNT;

    IF @RowCtr = 0

    BREAK;

    END

    SELECT * FROM dbo.TestTab_Main;

    SELECT * FROM dbo.TestTab_2008;

    SELECT * FROM dbo.TestTab_2009;

    SELECT * FROM dbo.TestTab_2010;

    SELECT * FROM dbo.Testvw;

  • Yes, that's very useful for the initial populating of the view base tables. But thinking there is a more fundamental problem that would affect existing stored procedures that are used for insert/update/delete on the partitioned view- see my other recent post.

  • Just as a side note: instead of

    WHERE DATEPART(YEAR,DateColumn)=2009

    I'd rather recommend use

    WHERE DateColumn>='20090101' AND DateColumny'20100101'

    The original version won't benefit from a (hopefully) existing index on DateColumn.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Good point. Plus I just realized that I really did not have a partitioned view in my original code.

    /*

    IF OBJECT_ID('dbo.Testvw') IS NOT NULL

    DROP VIEW dbo.Testvw;

    IF OBJECT_ID('dbo.TestTab_Main') IS NOT NULL

    DROP TABLE dbo.TestTab_Main;

    IF OBJECT_ID('dbo.TestTab_2008') IS NOT NULL

    DROP TABLE dbo.TestTab_2008;

    IF OBJECT_ID('dbo.TestTab_2009') IS NOT NULL

    DROP TABLE dbo.TestTab_2009;

    IF OBJECT_ID('dbo.TestTab_2010') IS NOT NULL

    DROP TABLE dbo.TestTab_2010;

    */

    CREATE TABLE dbo.TestTab_Main(RowId int IDENTITY(1,1) NOT NULL, DateColumn smalldatetime NOT NULL);

    CREATE TABLE dbo.TestTab_2008(RowId int NOT NULL, DateColumn smalldatetime NOT NULL

    CHECK(DateColumn >= '20080101' AND DateColumn < '20090101') PRIMARY KEY CLUSTERED(RowId,DateColumn));

    CREATE TABLE dbo.TestTab_2009(RowId int NOT NULL, DateColumn smalldatetime NOT NULL

    CHECK(DateColumn >= '20090101' AND DateColumn < '20100101') PRIMARY KEY CLUSTERED(RowId,DateColumn));

    CREATE TABLE dbo.TestTab_2010(RowId int NOT NULL, DateColumn smalldatetime NOT NULL

    CHECK(DateColumn >= '20100101' AND DateColumn < '20110101') PRIMARY KEY CLUSTERED(RowId,DateColumn));

    GO

    CREATE VIEW dbo.Testvw

    WITH SCHEMABINDING

    AS

    SELECT RowId,DateColumn FROM dbo.TestTab_2008

    UNION ALL

    SELECT RowId,DateColumn FROM dbo.TestTab_2009

    UNION ALL

    SELECT RowId,DateColumn FROM dbo.TestTab_2010

    GO

    INSERT dbo.TestTab_Main(DateColumn)

    SELECT DATEADD(hour,T1.number,'20080101') FROM

    (SELECT number FROM master.dbo.spt_values WHERE type = 'P') T1

    CROSS JOIN

    (SELECT TOP 10 number FROM master.dbo.spt_values) T2

    UNION ALL

    SELECT DATEADD(hour,T1.number,'20090101') FROM

    (SELECT number FROM master.dbo.spt_values WHERE type = 'P') T1

    CROSS JOIN

    (SELECT TOP 10 number FROM master.dbo.spt_values) T2

    UNION ALL

    SELECT DATEADD(hour,T1.number,'20100101') FROM

    (SELECT number FROM master.dbo.spt_values WHERE type = 'P') T1

    CROSS JOIN

    (SELECT TOP 10 number FROM master.dbo.spt_values) T2;

    GO

    DECLARE @MaxCtr int, @Batch int=100, @RowCtr int;

    WHILE(1=1)

    BEGIN

    SELECT @MaxCtr = ISNULL(MAX(RowId),0) FROM dbo.TestTab_2008;

    INSERT dbo.TestTab_2008(RowId,DateColumn)

    SELECT TOP (@Batch) RowId,DateColumn FROM dbo.TestTab_Main

    WHERE DateColumn >= '20080101' AND DateColumn < '20090101'--DATEPART(YEAR,DateColumn)=2008

    AND RowId > @MaxCtr

    ORDER BY RowId ASC;

    SET @RowCtr = @@ROWCOUNT;

    IF @RowCtr = 0

    BREAK;

    END

    WHILE(1=1)

    BEGIN

    SELECT @MaxCtr = ISNULL(MAX(RowId),0) FROM dbo.TestTab_2009;

    INSERT dbo.TestTab_2009(RowId,DateColumn)

    SELECT TOP (@Batch) RowId,DateColumn FROM dbo.TestTab_Main

    WHERE DateColumn >= '20090101' AND DateColumn < '20100101'--DATEPART(YEAR,DateColumn)=2009

    AND RowId > @MaxCtr

    ORDER BY RowId ASC;

    SET @RowCtr = @@ROWCOUNT;

    IF @RowCtr = 0

    BREAK;

    END

    WHILE(1=1)

    BEGIN

    SELECT @MaxCtr = ISNULL(MAX(RowId),0) FROM dbo.TestTab_2010;

    INSERT dbo.TestTab_2010(RowId,DateColumn)

    SELECT TOP (@Batch) RowId,DateColumn FROM dbo.TestTab_Main

    WHERE DateColumn >= '20100101' AND DateColumn < '20110101'--DATEPART(YEAR,DateColumn)=2010

    AND RowId > @MaxCtr

    ORDER BY RowId ASC;

    SET @RowCtr = @@ROWCOUNT;

    IF @RowCtr = 0

    BREAK;

    END

    SELECT * FROM dbo.TestTab_Main;

    SELECT * FROM dbo.TestTab_2008;

    SELECT * FROM dbo.TestTab_2009;

    SELECT * FROM dbo.TestTab_2010;

    SELECT * FROM dbo.Testvw;

    -- partitioned view checks - check exec plan to check that

    -- only the appropriate base table is accessed

    SELECT * FROM dbo.Testvw WHERE RowId = 250

    SELECT * FROM dbo.Testvw WHERE RowId = 42250

  • What table would a value for DateColumn ='20100101' get inserted? 😉



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (7/14/2011)


    What table would a value for DateColumn ='20100101' get inserted? 😉

    That's what happens when I update the script without looking at what I did or testing what I did 😛

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply