Integer or String data type for primary key

Question

Integer or String data type for primary key

garethmann101

SSCommitted

Points: 1980
More actions
September 15, 2008 at 7:04 am

#382638

I recieved some datasets in excel, they have a unique ID field with a numeric value but with a leading zero (i.e. '057891') If I conevrt this field to an integer data type I will lose the leading zero right?
I need to use this field as the primary key, so my question is:
"What are the issues with having a primary key as one of the string data types?"
Would I be better off dropping the leading zero and opting for one of the integer data types?
I suppose if I opt to keep the string data type then any foriegn key relationships in other tables will also need to be string data types....

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

That looks like an order system.

What I did in mine that really improved performance was to create the PK as an int, but I also kept the order id as string with the leading 0s (just in case they decide to change the display rule someday, but keep the historic data as is).

That way the joins were considerably faster and the reports came up much faster.

Steve Jones - SSC Editor SSC Guru Points: 728164 More actions · Answer 2

An int will join quicker just because it's smaller, but in most systems, it is likely a negligible delay. Plenty of people use SSN, email, etc. as PKs and they work fine.

[Edited]

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

SSC or SSN Steve?

You've been working here too long me thinks ;).

Just a final note, as always test, test, test. I chose to do it this way because it made a noticable difference in my system for the reports we needed to run... maybe our server back then was just too slow, or too busy and that small change made a huge difference.

Anyhow int = 4 bytes to process, your char(6) column would be 50% slower to process. The only question is 50% slower than what... if that's already under 1 MS, then you can go with design simplicity. But always keep in mind the amount of transactions done on that table... if you run 1000s of requests per minute or seconds, that 50% will compound pretty fast.

garethmann101 SSCommitted Points: 1980 More actions · Answer 4

I would be inclined to keep it in string format so as to not lose what may turn out to be important info in the future.

But one final concern:

If I use a string, how does that affect the indexing? I mean with a numeric value SQL Server can index the records using the "b-tree" structure right? (I think it is called "clusted indexing" or something in SQL Server jargon)

i.e.

-search records 1-1000 here

-search records 1-500 here

-search records 501-1000 here

-search records 1001-2000 here

-search records 1001-1500 here

-search records 1501-2000 here

etc. etc.

Can SQL Server still use the same indexing system for string data types? I.e. will it simply recognize the values as numeric anyway and perform the "B-Tree" index on them as if they were numbers?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

The big question to ask yourself before doing this is - is the zero in the original system simply a display item, or is it actually significant? Meaning - if the PK needed to be exanded to 7 digits from 6, would the old 6-digit be padded out, or would the new numbers not "overlap" the old ones due to the extra digit?

In other words, in the above case, would 000001 be equal or not equal to 0000001?

If there to be considered equal, then convert to int, possibly keeping the formatted version like NinjaRGR mentioned. If they're not - then you have no choice but to keep it as a string.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

garethmann101 (9/15/2008)
I would be inclined to keep it in string format so as to not lose what may turn out to be important info in the future.
But one final concern:
If I use a string, how does that affect the indexing? I mean with a numeric value SQL Server can index the records using the "b-tree" structure right? (I think it is called "clusted indexing" or something in SQL Server jargon)
i.e.
-search records 1-1000 here
-search records 1-500 here
-search records 501-1000 here
-search records 1001-2000 here
-search records 1001-1500 here
-search records 1501-2000 here
etc. etc.
Can SQL Server still use the same indexing system for string data types? I.e. will it simply recognize the values as numeric anyway and perform the "B-Tree" index on them as if they were numbers?

SQL Server does use a B-Tree structure for all of its indexes, but it build the B-Tree based on the data type. So the internal structure of the B-Tree will be different if the column being indexed is the string version than if you were to type it as an integer.

As to clustered vs not - that's an entirely different question. You probably want to read up a little on that difference.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

garethmann101 SSCommitted Points: 1980 More actions · Answer 7

Ok but will the B-tree structure be less efficient with string rather than numeric data types?

Steve Jones - SSC Editor SSC Guru Points: 728164 More actions · Answer 8

Doh!, SSN, edited the post.

good advice above. Be sure that you don't need the 0 before you remove it. Some systems need it, or might be expecting xx digits, so test well.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 9

garethmann101 (9/15/2008)
Ok but will the B-tree structure be less efficient with string rather than numeric data types?

A B-tree is a B-tree. It's not an easy question to answer, since one might be more balanced than the other.

The integers would still tend to be faster I'd think, since the B-tree will be shorter (based on the binary representation of the number), whereas the string version would be based on the succession of characters making up the PK.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

garethmann101 SSCommitted Points: 1980 More actions · Answer 10

I think what I will actually do is keep that field as a string field and maybe just create a brand new primary key integer field, I have just realised that the way i am designing the databases, that will not be difficult at all.

Shubh Chandra Jha-248340 SSC Veteran Points: 249 More actions · Answer 11

Numeric value is always a good choice for Key. But important thing is that, no matter whatever you are using (number/string/stamp) but it should meet your business requirement 100%.

For your problem, if you are only worried about your leading zero. I will suggest to truncate it. Because it has no significance while every value has same leading zero, and leading zero is arthimatically none.

While displaying value, as per your requirement, you can add required leading zero.

Hope it will help you.

😎

garethmann101 SSCommitted Points: 1980 More actions · Answer 12

It seems that noone mentioned the most obvious problem with using a string field as a primary key:

There appears to be no way to have the string field work as an automatic identity increment. This means that it will be very difficult to add new records (you cannot enter a blank value as a primary key, so you would need to design some elaborate system to create a new string as a new primary key value that does not already exist in the database, this means checking every record to insure the new value is definately unique.)

I have learnt the hard way, so now I think i will redesign this table so that the primary key is numeric, and I will have to also update every stored procedure and inline SQL statement in my database. ARRHHHH!

Unless I am missing something?

Shubh Chandra Jha-248340 SSC Veteran Points: 249 More actions · Answer 13

Its good choice

You know what I feel, at db level value should be at the purest level. And change query as per report requirement. If you change db as per report/display, db would be always in problem

its my view

Imran Saeed Newbie Points: 9 More actions · Answer 14

Have to admit that making a unique integer key is much easier compared to making a unique string key. You can probably make unique string keys but some arbitrary logic but then it's probably pointless. The only place where I can think of using string keys is where there are unique global definitions like currency codes or ISO country codes. There are however integer variations for these entities.

Most people argue that string keys are more readable than integer key but in my personal experience string keys become as meaningless as integers when you are trying to hash the keys based on descriptive names. For example, if a table was to hold foreign keys of geographical locations like India or Indiana (state) then the primary key abbreviations may be IND for India or INDA for Indiana which in first glance can be as confusing as numeric values.

My rule has always been to stick to integer unless string keys are required. Also integer is more space efficient for larger tables. 7 digit numbers can be stored in 4 bytes whereas 7 characters will end up in (at least) 7 bytes. This really makes a difference in larger tables.