Partitioned views based on DATETIME field

  • Anyone have any ideas/best practices on partitioning large volumes of timeline based data using a DATETIME field.

    I am estimating more than 40 million rows in a single table over a period of 8 months. I have read elsewhere that 40 million rows is considered the theoretical maximum number of rows that should be present in a single table. Hence my thinking of splitting the data up based on DATETIME. My idea was to split it on the month number. ie. into quarters. So in effect I would have data_q1, data_q2, data_q3 and data_q4. Then bringing it all together in a partitioned view called data which would mask the base tables.

    Now my problem so far involves defining the CHECK constraint for the base tables. I don't see why DATEPART(MONTH, date) IN (1, 2, 3) is not valid. BOL says only certain operators are permitted for an updatable partitioned view and they don't mention anything about deterministic functions.

    Anyone have any ideas on this that could help me?

  • Can you post your script?

  • Here is an example I quickly threw together to illustrate the problem:

    
    
    CREATE TABLE dbo.data_q1(
    [date] DATETIMENOT NULL CHECK (MONTH([date]) IN (1, 2, 3)),
    refid INTEGER NOT NULL,
    data INTEGERNOT NULL
    CONSTRAINT PK_data_q1 PRIMARY KEY CLUSTERED ([date], refid)
    )
    GO
    CREATE TABLE dbo.data_q2(
    [date] DATETIMENOT NULL CHECK (MONTH([date]) IN (4, 5, 6)),
    refid INTEGER NOT NULL,
    data INTEGERNOT NULL
    CONSTRAINT PK_data_q2 PRIMARY KEY CLUSTERED ([date], refid)
    )
    GO
    CREATE TABLE dbo.data_q3(
    [date] DATETIMENOT NULL CHECK (MONTH([date]) IN (7, 8, 9)),
    refid INTEGER NOT NULL,
    data INTEGERNOT NULL
    CONSTRAINT PK_data_q3 PRIMARY KEY CLUSTERED ([date], refid)
    )
    GO
    CREATE TABLE dbo.data_q4(
    [date] DATETIMENOT NULL CHECK (MONTH([date]) IN (10, 11, 12)),
    refid INTEGER NOT NULL,
    data INTEGERNOT NULL
    CONSTRAINT PK_data_q4 PRIMARY KEY CLUSTERED ([date], refid)
    )
    GO
    CREATE VIEW dbo.data WITH SCHEMABINDING
    AS
    SELECT[date],
    refid,
    data
    FROMdbo.data_q1
    UNION ALL
    SELECT[date],
    refid,
    data
    FROMdbo.data_q2
    UNION ALL
    SELECT[date],
    refid,
    data
    FROMdbo.data_q3
    UNION ALL
    SELECT[date],
    refid,
    data
    FROMdbo.data_q4
    GO

    INSERT INTO dbo.data VALUES ('01 Jan 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Feb 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Mar 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Apr 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 May 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Jun 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Jul 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Aug 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Sep 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Oct 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Nov 2003 08:30:00', 0, 0)
    INSERT INTO dbo.data VALUES ('01 Dec 2003 08:30:00', 0, 0)
  • you're using the IN operator in your check constrainst which apparently is not allowed for partitioned views. from BOL:

    "The CHECK constraints can only use these operators: BETWEEN, AND, OR, <, <=, >, >=, ="

    if you want to partition by quarter, how bout writing your constraints something like:

    CHECK([DATE] >= '01-JAN-2003' and [DATE] < '01-APR-2003')

    etc.

  • The problem in my mind is not that I am using the IN operator, but rather the MONTH() or DATEPART() functions. It seems that functions are not eligible in this case.

    Sure, I could make it compare to actual datetime values, but that implies that I need to define a good few years worth to prevent the system falling over.

    What I suppose I need is a more mathematical approach that satisfies SQL Server's internal desires. 🙂 You got any more ideas?

  • I modified your script and add a new partitioning column for partition data into different tables. It seems not the operator issue because I used "IN" in check constraint. I would suspect problem comes from datetime datatype.

    create TABLE dbo.data_q1(

    [date] DATETIMENOT NULL ,

    logmonth int CHECK (logmonth IN (1, 2, 3)),

    refid INTEGER NOT NULL,

    data INTEGERNOT NULL

    CONSTRAINT PK_data_q1 PRIMARY KEY CLUSTERED (logmonth, refid ))

    GO

    create TABLE dbo.data_q2(

    [date] DATETIMENOT NULL ,

    logmonth int CHECK (logmonth IN (4, 5, 6)),

    refid INTEGER NOT NULL,

    data INTEGERNOT NULL

    CONSTRAINT PK_data_q2 PRIMARY KEY CLUSTERED (logmonth, refid ))

    GO

    create TABLE dbo.data_q3(

    [date] DATETIMENOT NULL ,

    logmonth int CHECK (logmonth IN (7,8,9)),

    refid INTEGER NOT NULL,

    data INTEGERNOT NULL

    CONSTRAINT PK_data_q3 PRIMARY KEY CLUSTERED (logmonth, refid ))

    GO

    create TABLE dbo.data_q4(

    [date] DATETIMENOT NULL ,

    logmonth int CHECK (logmonth IN (10, 11, 12)),

    refid INTEGER NOT NULL,

    data INTEGERNOT NULL

    CONSTRAINT PK_data_q4 PRIMARY KEY CLUSTERED (logmonth, refid ))

    GO

    create VIEW dbo.data

    AS

    SELECT*

    FROMdbo.data_q1

    UNION ALL

    SELECT*

    FROMdbo.data_q2

    UNION ALL

    SELECT*

    FROMdbo.data_q3

    UNION ALL

    SELECT*

    FROMdbo.data_q4

    GO

    INSERT INTO dbo.data VALUES ('01 Jan 2003 08:30:00', DATEPART(mm, '01 Jan 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Feb 2003 08:30:00', DATEPART(mm, '01 Feb 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Mar 2003 08:30:00', DATEPART(mm, '01 Mar 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Apr 2003 08:30:00', DATEPART(mm, '01 Apr 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 May 2003 08:30:00', DATEPART(mm, '01 May 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Jun 2003 08:30:00', DATEPART(mm, '01 Jun 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Jul 2003 08:30:00', DATEPART(mm, '01 Jul 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Aug 2003 08:30:00', DATEPART(mm, '01 Aug 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Sep 2003 08:30:00', DATEPART(mm, '01 Sep 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Oct 2003 08:30:00', DATEPART(mm, '01 Oct 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Nov 2003 08:30:00', DATEPART(mm, '01 Nov 2003 08:30:00'),0, 0)

    INSERT INTO dbo.data VALUES ('01 Dec 2003 08:30:00', DATEPART(mm, '01 Dec 2003 08:30:00'),0, 0)

  • i guess i misunderstood what you were after. i assumed that your date was a date entered, date created, or some such, where the range of allowable values is bounded. in such cases you can create the required partitions before they're needed (either manually or preferably through a recurring scheduled job that creates and/or drops tables and then recompiles the view(s)).

    sounds like you just wanna hash dates into one of four partitions; though it seems to me this only delays the problem you set out to avoid (hitting 40M rows/table).

    and, yeah, you're right 🙂 it appears that IN is perfectly acceptable in updatable partitioned views. this worked fine:

    create table t1(

    col1 int not null primary key check([col1] in (1)))

    create table t2(

    col1 int not null primary key check([col1] in (2)))

    create view v

    as

    select col1

    from t1

    union all

    select col1

    from t2

    insert v values(1)

    insert v values(2)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply