non clustered primary key on a partitioned taable

Question

non clustered primary key on a partitioned taable

jvkondapalli

SSCommitted

Points: 1587
More actions
January 24, 2012 at 6:59 pm

#267202

....
Hello gurus,
I need your comments on the table partitioning we implemented in our environment.
we have a pk column (INT), Partitioning column(Datetime), fkcolumn (INT), few other varchar columns
We created a clustered index on a Pk and partitioning column. Reason: our partitioning column is updated freequently. so inorder to avoid page splits, we created a clustered index on pk and partitioning column. please comment on this approach. Also, let me know if i have any misconception here.
When I try to alteer table to create a non clustered primary key on a fkcolumn it is complaining that partitioning column should be part of the non clustered pk. As per my understanding, every non-clustered index will and should have the clustered index column/s as part of the non clustered index. If that understanding is correct, then, it should not complain for the partitioning column not being part of the non clustered pk.
Please clarify my misconception.
Thanks
Jagan K

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Revenant SSC-Forever Points: 42467 More actions · Answer 1

jvkondapalli (1/24/2012)
....
Hello gurus,
I need your comments on the table partitioning we implemented in our environment.
we have a pk column (INT), Partitioning column(Datetime), fkcolumn (INT), few other varchar columns
We created a clustered index on a Pk and partitioning column. Reason: our partitioning column is updated freequently. so inorder to avoid page splits, we created a clustered index on pk and partitioning column. please comment on this approach. Also, let me know if i have any misconception here.
When I try to alteer table to create a non clustered primary key on a fkcolumn it is complaining that partitioning column should be part of the non clustered pk. As per my understanding, every non-clustered index will and should have the clustered index column/s as part of the non clustered index. If that understanding is correct, then, it should not complain for the partitioning column not being part of the non clustered pk.
Please clarify my misconception.

jvkondapalli, what is the relationship between your PK column (presumably an ID) and your partitioning column?

Do you expect data from the past to show up later? (Meaning, a date say one month back being assigned a new ID as the current top ID +1?)

jvkondapalli SSCommitted Points: 1587 More actions · Answer 2

Revenant,

....
Hello gurus,
I need your comments on the table partitioning we implemented in our environment.
we have a pk column (INT), Partitioning column(Datetime), fkcolumn (INT), few other varchar columns
We created a clustered index on a Pk and partitioning column. Reason: our partitioning column is updated freequently. so inorder to avoid page splits, we created a clustered index on pk and partitioning column. please comment on this approach. Also, let me know if i have any misconception here.
When I try to alteer table to create a non clustered primary key on a fkcolumn it is complaining that partitioning column should be part of the non clustered pk. As per my understanding, every non-clustered index will and should have the clustered index column/s as part of the non clustered index. If that understanding is correct, then, it should not complain for the partitioning column not being part of the non clustered pk.
Please clarify my misconception.
jvkondapalli, what is the relationship between your PK column (presumably an ID) and your partitioning column?
Do you expect data from the past to show up later? (Meaning, a date say one month back being assigned a new ID as the current top ID +1?)

1. PK is a running number and partitioning column will get updated when that record is processed. Practically, in our system, that same record could

be processed a month later. This is the reason why we are having 3 months of data. on the 4th month, we will switch out the oldest parttion and

merge it.

2. I dont want that data to show up with the new ID. the running number should be same.

Thanks
Jagan K

jvkondapalli SSCommitted Points: 1587 More actions · Answer 3

jvkondapalli (1/24/2012)
Revenant,
....
Hello gurus,
I need your comments on the table partitioning we implemented in our environment.
we have a pk column (INT), Partitioning column(Datetime), fkcolumn (INT), few other varchar columns
We created a clustered index on a Pk and partitioning column. Reason: our partitioning column is updated freequently. so inorder to avoid page splits, we created a clustered index on pk and partitioning column. please comment on this approach. Also, let me know if i have any misconception here.
When I try to alteer table to create a non clustered primary key on a fkcolumn it is complaining that partitioning column should be part of the non clustered pk. As per my understanding, every non-clustered index will and should have the clustered index column/s as part of the non clustered index. If that understanding is correct, then, it should not complain for the partitioning column not being part of the non clustered pk.
Please clarify my misconception.
jvkondapalli, what is the relationship between your PK column (presumably an ID) and your partitioning column?
Do you expect data from the past to show up later? (Meaning, a date say one month back being assigned a new ID as the current top ID +1?)
1. PK is a running number and partitioning column will get updated when that record is processed. Practically, in our system, that same record could
be processed a month later. This is the reason why we are having 3 months of data. on the 4th month, we will switch out the oldest parttion and
merge it.
2. I dont want that data to show up with the new ID. the running number should be same.

Any thoughts from the experts? your valuable suggestions/comments are deeply appreciated.

Thanks

Jagan K

Thanks
Jagan K

Jared SSC Guru Points: 61793 More actions · Answer 4

I think the first question is "does having the primary key and clustered index on an identity column and the partitioned column of datetime eliminate page splits when new data is entered?"

From my understanding, I don't see how this prevents page splits. (Feel free to jump in gurus) In fact, I think with frequent updates on the partitioned column it will create MORE page splits. I see no need to have this column as part of the key as it does not describe any uniqueness at this point of my understanding.

Jared
CE - Microsoft

jvkondapalli SSCommitted Points: 1587 More actions · Answer 5

Per my understanding, partitioning column should either be part of the clustered index or the Primary key and it should not be null. Else, it will reside left most partition.

Again, I might be wrong, please comment.

Thanks
Jagan K

Jared SSC Guru Points: 61793 More actions · Answer 6

The problem is you are saying that that column gets updated frequently. So every time you update that value, it has to move the data. That may be to another page and even another file. I would hesitate partitioning on a column that gets frequent updates.

Jared
CE - Microsoft

Paul White SSC Guru Points: 150457 More actions · Answer 7

This script demonstrates some concepts to answer some of your questions:

USE tempdb

GO

CREATE PARTITION FUNCTION PF(date) AS

RANGE RIGHT FOR VALUES ('2012-02-01', '2012-03-01', '2012-04-01');

GO

CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY]);

GO

-- Partitioned heap

CREATE TABLE dbo.Heap

(

pk integer IDENTITY NOT NULL,

the_date date NOT NULL,

fk integer NULL,

data varchar(50) NULL

) ON PS (the_date);

GO

-- Sample data

INSERT dbo.Heap

(the_date, fk, data)

VALUES

('2012-01-04', 1001, 'a'),

('2012-01-08', 1002, 'b'),

('2012-02-21', 1003, 'c'),

('2012-03-07', 1004, 'd'),

('2012-03-15', 1005, 'e'),

('2012-03-24', 1006, 'f');

GO

-- Show partitions and row counts

SELECT

partition_id = $PARTITION.PF(the_date),

row_count = COUNT_BIG(*)

FROM dbo.Heap AS h

GROUP BY

$PARTITION.PF(the_date)

GO

-- Error: Partition columns for a *unique* index must be

-- a subset of the index *key* (included column does not count)

CREATE TABLE dbo.Clustered1

(

pk integer IDENTITY NOT NULL,

the_date date NOT NULL,

fk integer NULL,

data varchar(50) NULL,

CONSTRAINT [PKC dbo.Clustered1 pk P:the_date]

PRIMARY KEY (pk) ON PS (the_date)

);

GO

-- Same error, for non-clustered *unique* index (PK)

CREATE TABLE dbo.Clustered2

(

pk integer IDENTITY NOT NULL,

the_date date NOT NULL,

fk integer NULL,

data varchar(50) NULL,

CONSTRAINT [PKN dbo.Clustered2 pk P:the_date]

PRIMARY KEY NONCLUSTERED (pk) ON PS (the_date)

);

GO

-- Start as an ordinary heap with a nonclustered PK

CREATE TABLE dbo.Clustered3

(

pk integer IDENTITY NOT NULL,

the_date date NOT NULL,

fk integer NULL,

data varchar(50) NULL,

CONSTRAINT [PKN dbo.Clustered3 pk]

PRIMARY KEY NONCLUSTERED (pk)

);

GO

-- Partitioned non-unique clustered index is allowed

-- Note: the existing nonclustered PK stays NOT partitioned

CREATE CLUSTERED INDEX [CX dbo.Clustered3 fk P:the_date]

ON dbo.Clustered3 (fk) ON PS(the_date);

GO

DROP TABLE dbo.Heap;

DROP TABLE dbo.Clustered2;

DROP TABLE dbo.Clustered3

DROP PARTITION SCHEME PS;

DROP PARTITION FUNCTION PF;

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi