GUID or Int is the best for primary key columns?

Question

GUID or Int is the best for primary key columns?

Viewing 15 posts - 31 through 45 (of 51 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

David.Poole (7/16/2009)
The problem with a compound key where one of the fields is a ServerId is when you are running a disaster recovery site.
You have to remember NOT to synchronise the tables that tell you what the ServerId means.
It is not insurmountable but it is an easy mistake to make.

I would simply consider that one of the (often MANY) things that need to be done (possibly as an automatic evolution) if/when the holy sh-t scenario happens and you shift operations to the DR site. Your test verification of your DR plan would ensure that it happened as expected. :w00t:

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

PiMané SSChampion Points: 13965 More actions · Answer 2

That's if the primary key is the clustered index, which it doesn't have to be. There's no real problems with a random guid as the primary key. The problem (rapid fragmentation) comes in when the guid is the clustered index.
Generally I'd say if the primary key is a guid, make the primary key nonclustered and put the clustered index on a more suitable column, if one exists.

Is there any need of having a clustered index for other indexes seeks'?

Or since the PK are 2 GUID's and no other column seems to be suitable there's no need to have a clustered index?

Or, if we have a status column, with N possible values, or an active column, we can use it as a clustered index to "impersonate" partitions (we dont have enough disks to have N RAID to have "real" physical partitions :S )

Thanks,

Pedro

If you need to work better, try working less...

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

PiMané (9/10/2009)
That's if the primary key is the clustered index, which it doesn't have to be. There's no real problems with a random guid as the primary key. The problem (rapid fragmentation) comes in when the guid is the clustered index.
Generally I'd say if the primary key is a guid, make the primary key nonclustered and put the clustered index on a more suitable column, if one exists.
Is there any need of having a clustered index for other indexes seeks'?
Or since the PK are 2 GUID's and no other column seems to be suitable there's no need to have a clustered index?
Or, if we have a status column, with N possible values, or an active column, we can use it as a clustered index to "impersonate" partitions (we dont have enough disks to have N RAID to have "real" physical partitions :S )
Thanks,
Pedro

1) There are several good reasons to have clustered index on every table. Search the web for justifications.

2) as long as you query the table in the same order you have your PK index then at least you won't need 2 indexes on these columns. if you do search by the second column in that index only, then you would only be able to scan and not seek, which will affect performance.

3) BAD idex to do clustered index on a status, unless that status never changes (unlikely). Otherwise you get guaranteed moves of the data every time the status value changes, which would both fragment and have performance hit.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

EdVassie SSC Guru Points: 60312 More actions · Answer 4

IMHO do not worry about separating partitions to different disks. Using partitions has a very significant advantage over using a status column in the clustered index...

The partitioning key is a Stage 0 predicate. This means that the data is filtered by the optimiser. If the optimiser works out you only need to access partition 3 of your table, then what is happening in the other partitions (e.g. locks) is of no interest to your query.

If you are relying on a column in your cluster index to localise access to your data, then your query can be held up by locks taken on parts of the index or on the underlying table. This can be avoided by using partitions.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

PiMané SSChampion Points: 13965 More actions · Answer 5

2) as long as you query the table in the same order you have your PK index then at least you won't need 2 indexes on these columns. if you do search by the second column in that index only, then you would only be able to scan and not seek, which will affect performance.

Nowadays, in web applications, a user can sort the records by any column and order just by clicking on it... so order isn't quite an issue.

And I'm using a vertical database with lots of sub queries:

SELECT r.Id, (SELECT FieldValue FROM table1 t WHERE t.Id = r.Id AND t.ClientId = r.ClientId AND t.FieldId = '[GUID]') AS name, (SELECT FieldValue FROM table1 t WHERE t.Id = r.Id AND t.ClientId = r.ClientId AND t.FieldId = '[GUID]') AS phone FROM roottable r WHERE r.ClientId = [GUID]

For sorting I use a sub query that uses only the ClientId, Id and necessary fields to sort with ROW_NUMBER() and INNER JOIN with the query above ordering by the ROW_NUMBER() result.

Unfortunately our application has lots of GUIDs due to server sincronization.

Pedro

If you need to work better, try working less...

PiMané SSChampion Points: 13965 More actions · Answer 6

IMHO do not worry about separating partitions to different disks. Using partitions has a very significant advantage over using a status column in the clustered index...

Does partitioning with only one set of RAID have any advantage?

Our current budget only "allows" 2 RAID, one for OS and SQL Server installation, other for SQL Server Data and Log files... Yep, I know the log files should be on another RAID but not enough budget....

If we create 2 partitions (Drive D and E) on the second RAID, one for Data and one for Log, does it have any advantage, knowing it's the same disks?!

And every partition should have one FILEGROUP for optimal performance, and each FILEGROUP should be on a set of RAID disks... (once again... budget problem). If I create different FILEGROUPs to store the data partitions and, eventually, INDEX partitions (for example, create one partition to store all indexes, separate them from the data) on the same RAID, does it have any advantage or disadvantage (slower) over having all on the PRIMARY PARTITION?

I know one advantage: the FILEGROUPs are already created and only need to move them over to the new disk...

Thanks,

Pedro

If you need to work better, try working less...

EdVassie SSC Guru Points: 60312 More actions · Answer 7

I was thinking of SQL Server table partitioning, not Windows drive partitions...

There is nothing to stop you putting all your file groups on a single drive. However, you should experiment to see if the disk controler can handle the number of I-O requests that could be issued by SQL Server. If it is not up to the job, your performance will be worse than having a single filegroup.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

PiMané SSChampion Points: 13965 More actions · Answer 8

Sorry for the confusion, I was also talking about table partition but I read somewhere that table partition to be 100% efficient should have one filegroup for each table partition and each filegroup should have one RAID set... or does one single RAID set already helps performance?!

By the way, do you think that a filegroup only for indexes is helpful for performance? (probably the answer is the same as always... it depends 😀 )

Thanks,

Pedro

If you need to work better, try working less...

EdVassie SSC Guru Points: 60312 More actions · Answer 9

The advice about puting each filegroup on its own RAID set has a good intention at its heart, but is not really correct.

SQL Server can initiate multiple I-O streams against a single database file, but will not always do this. However, if you split tables into multiple filegroups you force SQL to use as a minimum one I-O stream per filegroup.

If each filegroup is on its own RAID set, you have a high probability that the RAID controller can handle the I-O requests withuot saturating the disk queue. If multiple filegroups are on a single RAID set (or routed through a single HBA for a SAN), then the number of I-O requests may saturate the disk queue and performance can become worse than if a single filegroup was used. (Depending on your disk subsystem, it may not always be the Disk Queue counter you have to check for saturation.)

If you go back 4 or 5 years, very few disk controllers attached to Windows servers could cope with the I-O generated by having as many as 2 busy filegroups. This has changed over time and performance is much better now, but typically a Windows server still has a small fraction of the I-O capability of a traditional mainframe.

Therefore if you want to consider partitioning and you have to put all I-O through 1 or 2 controlers you should do some simulation before implementing your design in Production. Google can find you a lot of information about how you can simulate a production I-O load.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 10

1) >>Does partitioning with only one set of RAID have any advantage?

It can actually have a DISadvantage in causing additional head thrashing and seek/access times, resulting in WORSE performance, especially if few physical spindles are involved.

2) With only 2 sets of RAID devices (one for OS/binaries and all sql data/logs on another) I would trend agv disk sec/read and avg disk sec/write on both sets and if the sql one became saturated with the OS one relatively idle (a common scenario from my experience) I would shift something (tempdb or tlogs) over to the OS partition and then remonitor.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

David.Poole SSC Guru Points: 75656 More actions · Answer 11

The advantages of partitioning where only one filegroup is used is in the simplicity of splitting, merging and switching partitions to achieve a rapid delete of a large volume of data.

In SQL2008 you can use the partition switch function with replicated tables!

LinkedIn Profile

Paul White SSC Guru Points: 150457 More actions · Answer 12

EdVassie (9/10/2009)
The advice about puting each filegroup on its own RAID set has a good intention at its heart, but is not really correct.
SQL Server can initiate multiple I-O streams against a single database file, but will not always do this. However, if you split tables into multiple filegroups you force SQL to use as a minimum one I-O stream per filegroup.

:blink: CSS SQL Server Engineers: Urban Legends

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

PiMané SSChampion Points: 13965 More actions · Answer 13

I read this article: http://technet.microsoft.com/en-us/library/cc966534.aspx.

According to it there should be at least 3 RAID sets: one for TEMPDB, another for LOG files and another for DATA files.

And there should be, in our case we have a XEON Quad Code, 4 TEMPDB data files, and eventually 1 to 4 DATA files per FILEGROUP (but it doesn't specify how many FILEGROUPs are recommended), evenly distributed.

Even if with just one RAID set is this "true"? Does the CPU, just by itself, determines the number of data files?

Are we in a stand where we should let the hardware do the hard work or should we try to "play" with FILEGROUPs (for different data: intensive read, intensive write; indexes), table PARTITION?!?

What's your opinion?

Pedro

If you need to work better, try working less...

EdVassie SSC Guru Points: 60312 More actions · Answer 14

You really need to do some workload simulation, so you have ral figures for your installation.

Your tests will show how many I-O streams can be supported by your disk subsystem. This will give you the information you need to decide if multiple filegroups will help improve your performance.

The main target to aim for is to comply with your performance SLAs. If you comply with the way things are now, then you do not need to change anything.

If you are out of compliance and your tests show that multiple filegroups will help, then you may be able to get back into compliance. If your tests show that you cannot improve I-O performance then you can go to your management with the figures you need for a business case to get extra hardware.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Paul White SSC Guru Points: 150457 More actions · Answer 15

Ed, please read the link I posted for you.

As far as hardware is concerned, the basic idea is to separate random I/O from sequential I/O. This is the reasoning behind separating log (sequential) from data (usually more random than sequential). Having tempdb on a separate sub-system is also A Good Thing for many systems. Some systems benefit from having tempdb on a RAM drive, or solid-state storage like FusionIO or RAMSan.

After that, it's incremental. Sure, for high-end systems, it can be worth separating indexes from data, or partitioning. Usually, by that stage the system requires a SAN, not locally-attached storage - and there begins a whole new story.

My advice, unless you are putting together the next Google hardware platform and have to get it right on day one, is to start sensibly (with a basic separation of log, data, and tempdb), and allow for future 'tweaking'.

Log on RAID 1. Data on RAID 5 or 10. Tempdb on the fastest thing you can afford. Generally speaking, anyway. There is a lot to be said for setting up the hardware simply but well, and letting SQL Server and the hardware work it out.

You can drive yourself mad otherwise 🙂

BTW, don't forget how much SQL Server loves RAM and cores...

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi