Storing encypted SSN's

Question

Storing encypted SSN's

Viewing 15 posts - 31 through 45 (of 48 total)

You must be logged in to reply to this topic. Login to reply

ram302 SSCommitted Points: 1618 More actions · Answer 1

Eric M Russell (4/27/2016)
GilaMonster (4/27/2016)
Eric M Russell (4/27/2016)
at this point you might as well use symmetric key encryption instead of hashing.
I disagree. If you're using symmetric encryption and the key is compromised, all data can be decrypted. With a salted hash it can't be reversed. It can be brute forced, but that takes time with the stronger hash algorithms. Salting the hash means you can't use a simple rainbow table. It's about making things harder, not making things impossible.
Please show an example of what you and Steve mean by "salting the hash".

I found this: https://www.mssqltips.com/sqlservertip/3293/add-a-salt-with-the-sql-server-hashbytes-function/[/url]

Steve Jones - SSC Editor SSC Guru Points: 728164 More actions · Answer 2

splitting hairs.

Losing the salt would be the equivalent of losing the symmetric key, if the domain is known. In the case of a social security number/taxid, this is the case. The domain is fixed, so while the user can't "decrypt" the hash, they can break the security with a rainbow table.

In either case, it's a bad situation. However, assuming you can somewhat protect the salt (or key) in the application, either works.

The hash is much, much faster, with fewer resources than the symmetric key encryption, but since either will likely be done on the client, not server, I'm not sure it matters.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Sergiy SSC Guru Points: 110192 More actions · Answer 3

ram302 (4/26/2016)
I will never need to decrypt the SSN, but I will need to add/remove ones as personnel are hired/terminated. Also, the table itself may contain multiple entries for one particular SSN.

If you never need to decrypt SSN's (which means - you do not need to display them anywhere, it's only for internal identification) then I cannot see how it's different from using an auto-generated number (say, IDENTITY integers) instead of SSN hash.

_____________
Code for TallyGenerator

ram302 SSCommitted Points: 1618 More actions · Answer 4

Sergiy (4/27/2016)
ram302 (4/26/2016)
I will never need to decrypt the SSN, but I will need to add/remove ones as personnel are hired/terminated. Also, the table itself may contain multiple entries for one particular SSN.
If you never need to decrypt SSN's (which means - you do not need to display them anywhere, it's only for internal identification) then I cannot see how it's different from using an auto-generated number (say, IDENTITY integers) instead of SSN hash.

This did cross my mind, but if I truncate and import latest data file daily, it complicates things more; I'd have to re-map it all, and try to keep the same keys since it's what will be used in my look-up. Whereas in the hashing (with salt), seems to like the plausible solution for me.

Eric M Russell SSC Guru Points: 125255 More actions · Answer 5

Steve Jones - SSC Editor (4/27/2016)
splitting hairs.
Losing the salt would be the equivalent of losing the symmetric key, if the domain is known. In the case of a social security number/taxid, this is the case. The domain is fixed, so while the user can't "decrypt" the hash, they can break the security with a rainbow table.
In either case, it's a bad situation. However, assuming you can somewhat protect the salt (or key) in the application, either works.
The hash is much, much faster, with fewer resources than the symmetric key encryption, but since either will likely be done on the client, not server, I'm not sure it matters.

Hashing is problematic for a number of reasons. For one thing, because hashes are intended to uniqueify anything from a single byte to a multi-megabyte file, the hashed values require more storage. For example SHA2_512 is (64) bytes fixed, which is cumbersome if you're attempting to hash something like FirstName, LastName, DOB, and SSN columns separately. Think about what it takes to index that, and keep in mind that these hashed values are essentially random, so they don't sort sequentially or compress.

Also, if you're thinking about using hashing as a means of encryption, keep in mind that you can't derive the original value using an UnHash function and key. You must implement a cross-reference table with the unhashed version of the data somehwere within your database, otherwise you'll have no way to access the unhashed version of the data when needed. Maybe you'll never need to know the unhashed version of a customer's SSN, but you'll need to know their actual name or phone number.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

ram302 SSCommitted Points: 1618 More actions · Answer 6

Eric M Russell (4/28/2016)
Steve Jones - SSC Editor (4/27/2016)
splitting hairs.
Losing the salt would be the equivalent of losing the symmetric key, if the domain is known. In the case of a social security number/taxid, this is the case. The domain is fixed, so while the user can't "decrypt" the hash, they can break the security with a rainbow table.
In either case, it's a bad situation. However, assuming you can somewhat protect the salt (or key) in the application, either works.
The hash is much, much faster, with fewer resources than the symmetric key encryption, but since either will likely be done on the client, not server, I'm not sure it matters.
Hashing is problematic for a number of reasons. For one thing, because hashes are intended to uniqueify anything from a single byte to a multi-megabyte file, the hashed values require more storage. For example SHA2_512 is (64) bytes fixed, which is cumbersome if you're attempting to hash something like FirstName, LastName, DOB, and SSN columns separately. Think about what it takes to index that, and keep in mind that these hashed values are essentially random, so they don't sort sequentially or compress.
Also, if you're thinking about using hashing as a means of encryption, keep in mind that you can't derive the original value using an UnHash function and key. You must implement a cross-reference table with the unhashed version of the data somehwere within your database, otherwise you'll have no way to access the unhashed version of the data when needed. Maybe you'll never need to know the unhashed version of a customer's SSN, but you'll need to know their actual name or phone number.

Their name is not required to be protected, so that will remain. Everything else seems to be ok in the public domain, but I'll request confirmation saying so. I will never need to un-hash the SSN, and this is what I'll be using for the lookup.

Steve Jones - SSC Editor SSC Guru Points: 728164 More actions · Answer 7

Eric M Russell (4/28/2016)
Steve Jones - SSC Editor (4/27/2016)
splitting hairs.
Losing the salt would be the equivalent of losing the symmetric key, if the domain is known. In the case of a social security number/taxid, this is the case. The domain is fixed, so while the user can't "decrypt" the hash, they can break the security with a rainbow table.
In either case, it's a bad situation. However, assuming you can somewhat protect the salt (or key) in the application, either works.
The hash is much, much faster, with fewer resources than the symmetric key encryption, but since either will likely be done on the client, not server, I'm not sure it matters.
Hashing is problematic for a number of reasons. For one thing, because hashes are intended to uniqueify anything from a single byte to a multi-megabyte file, the hashed values require more storage. For example SHA2_512 is (64) bytes fixed, which is cumbersome if you're attempting to hash something like FirstName, LastName, DOB, and SSN columns separately. Think about what it takes to index that, and keep in mind that these hashed values are essentially random, so they don't sort sequentially or compress.
Also, if you're thinking about using hashing as a means of encryption, keep in mind that you can't derive the original value using an UnHash function and key. You must implement a cross-reference table with the unhashed version of the data somehwere within your database, otherwise you'll have no way to access the unhashed version of the data when needed. Maybe you'll never need to know the unhashed version of a customer's SSN, but you'll need to know their actual name or phone number.

In terms of encrypting one field, you could argue the single size of hashing (64b) v the symmetric encrypted valaue + overhead is negligible. If I hash a multi-MB file, the hash requires less storage.

Indexing matters for determinism, not necessarily ordering. If you are searching for ranges, they both stink, but hashing gives you determinism. ENCRYPTBYPASSPHRASE or ENCRYPTBYKEY do not give you that.

Hashing != encryption. It's a transformation that may, or may not, reveal the original text. The OP mentioned no needing decryption, so hashing makes more sense.

I'm not arguing this in a general case, but in this case.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

ram302 SSCommitted Points: 1618 More actions · Answer 8

Hashing is that I decided with, it made most sense. Thanks guys for all your input.

Sergiy SSC Guru Points: 110192 More actions · Answer 9

http://opendata.stackexchange.com/questions/5595/how-should-us-ssn-be-anonimized

Since SSN has only 9 digits, changing hash function will not suffice because attacker can simply apply the function to all 10^9 SSN's and match the result against the database.

Any hacker would have a mapping table between all possible SSN's and corresponding hashes. If they don't at the moment - it's not so long to generate one using the method of your choice.

So, you better use internally generated ID's, and for mapping use a table stored in another database with restricted access.

May be even on another dedicated instance with specially designed security model.

_____________
Code for TallyGenerator

ram302 SSCommitted Points: 1618 More actions · Answer 10

Sergiy (4/28/2016)
http://opendata.stackexchange.com/questions/5595/how-should-us-ssn-be-anonimized
Since SSN has only 9 digits, changing hash function will not suffice because attacker can simply apply the function to all 10^9 SSN's and match the result against the database.
Any hacker would have a mapping table between all possible SSN's and corresponding hashes. If they don't at the moment - it's not so long to generate one using the method of your choice.
So, you better use internally generated ID's, and for mapping use a table stored in another database with restricted access.
May be even on another dedicated instance with specially designed security model.

And just to be a jerk, hash that meaningless id.

It makes sense to this as well, a separate db/table maintaining a mapping of ssn => unique_id. And a trigger that will check before inserts into the main records table whether there is a SSN in the mapping table; if there is, use that key to insert into the main records table, otherwise insert SSN into mapping table and a new id (identity column) will be created for this ssn key. Sound reasonable?

Steve Jones - SSC Editor SSC Guru Points: 728164 More actions · Answer 11

You certainly could take the SSN from the app and do a hash/map in another table. Use a salt, protect the tables with limited access and that can work. I might use a proc to do this, with EXECUTE AS, and no permissions outside of dbo for access.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Eric M Russell SSC Guru Points: 125255 More actions · Answer 12

Sergiy (4/28/2016)
http://opendata.stackexchange.com/questions/5595/how-should-us-ssn-be-anonimized
Since SSN has only 9 digits, changing hash function will not suffice because attacker can simply apply the function to all 10^9 SSN's and match the result against the database.
Any hacker would have a mapping table between all possible SSN's and corresponding hashes. If they don't at the moment - it's not so long to generate one using the method of your choice.
So, you better use internally generated ID's, and for mapping use a table stored in another database with restricted access.
May be even on another dedicated instance with specially designed security model.

The following will build a dictionary table containing SSN and SHA2_512 hash for every possible social security number (~ 1 billion rows). I hacked this together in 1/2 hour so forgive any T-SQL coding practices that offend your sensibilities. However, it does works.

Allow 200 GB for table, index, and temp space, and let it run overnight.

Do NOT run this on production server!!!

create table HashSSN

(

SSN char(11) not null

constraint PK_HashSSN primary key

with (data_compression = page, ignore_dup_key = on),

SSN_SHA2_512 binary(64) not null

);

create unique index ix_SSN_SHA2_512

on HashSSN ( SSN_SHA2_512 )

with (data_compression = page);

GO

set nocount on;

declare @i int = isnull((

select cast(replace(max(SSN), '-', '') as int)

from HashSSN

), 0);

while @i < 999999999

begin;

-- print status message for each loop:

raiserror ('@i = %i', 0, 1, @i) with nowait;

insert into HashSSN (SSN, SSN_SHA2_512)

select stuff(stuff(cast(SSN as char(11)), 4, 0, '-'), 7, 0, '-') as SSN, hashbytes('SHA2_512', SSN) as SSN_SHA2_512

from (

select right('000000000' + cast(N as varchar(9)), 9) as SSN

from (

select top 250000 @i + row_number() over (

order by error, msglangid

) as n

from master.sys.sysmessages

) T1

where N <= 999999999

) as T2

order by SSN;

-- set database to simple recovery and manually checkpoint log here.

checkpoint;

select @i = @i + 250000;

end;

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Steve Jones - SSC Editor SSC Guru Points: 728164 More actions · Answer 13

Eric M Russell (4/29/2016)
The following will build a dictionary table containing SSN and SHA2_512 hash for every possible social security number (~ 1 billion rows). I hacked this together in 1/2 hour so forgive any T-SQL coding practices that offend your sensibilities. However, it does works.

And for more practice: http://adventofcode.com/day/4

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

ram302 SSCommitted Points: 1618 More actions · Answer 14

Ok, if anyone here has the time and/or patience to review my final thoughts on this issue, here it is. I think this will be the final approach (please disregard my 'FINAL' declaration before).

Although hashing (with salt) would have been a quicker and easier approach, I don't think I could deliver something that I know could have been more secure.

So let's suppose the user_login is what we want to hide from the world; our SSN equivalent. So I have two tables; employee_certifications with all the data and a new UID and another employee_id_mapping for storing the valued user logins and a unique id used for mapping:

use mycompany

go

create table employee_certifications

(

id bigint,

fname varchar(30),

lname varchar(30)

)

create table employee_id_mapping

(

employee_id bigint identity(1000000,1),

user_login varchar(40)

)

Now, I'll be creating a stored proc, passing it fname, lname and user_login, to do the following:

- Performing a lookup in the mapping table using the user_login supplied.

- If one exists, use this user_login. Otherwise, create a new one, and use this newly created user_login.

- Finally, insert into certifications table using the user_login, whether it was a newly created one or an existing one.

USE MYCOMPANY

GO

ALTER PROCEDURE insert_record

@fname varchar(30)

, @lname varchar(30)

, @user_login VARCHAR(40)

AS

SET NOCOUNT ON

-- Store the employee id here when ready.

DECLARE @insert_employee_id INT

-- This will be populated when the insert into EMPLOYEE_ID_MAPPING is executed

DECLARE @NEW_EMPLOYEE_ID TABLE(id BIGINT)

-- Check whether there is a user login that exists in the mapping table; if so, return the id and use for inserting into employee_certifications.

IF EXISTS( SELECT EMPLOYEE_ID FROM EMPLOYEE_ID_MAPPING E WHERE E.user_login = @user_login)

BEGIN

set @insert_employee_id = (SELECT EMPLOYEE_ID FROM EMPLOYEE_ID_MAPPING E WHERE E.user_login = @user_login)

END

-- Otherwise, no login found, insert into mapping table, returning the new id for inserting into employee_certifications.

ELSE

BEGIN

-- This insert should return an id (IDENTITY) into NEW_EMPLOYEE_ID table

INSERT INTO employee_id_mapping (user_login)

OUTPUT INSERTED.employee_id INTO @NEW_EMPLOYEE_ID

VALUES (@user_login)

-- Store new id value from table

SELECT @insert_employee_id = id FROM @NEW_EMPLOYEE_ID

END

-- Insert into employe_certifications, using the id; may be an existing one, or a new one if employee record isn't found by user login.

INSERT INTO employee_certifications

(id, fname, lname)

(SELECT @insert_employee_id, @FNAME, @LNAME)

GO

And below is some test data I used:

exec insert_record @fname = "ROY", @lname = "MCCOY", @user_login = "RMCCOY"

exec insert_record @fname = "ANA", @lname = "BANANA", @user_login = "ABANANA"

exec insert_record @fname = "TERRY", @lname = "CHERRY", @user_login = "TCHERRY"

exec insert_record @fname = "RICHARD", @lname = "RICHARDS", @user_login = "RRICHARDS"

exec insert_record @fname = "LINA", @lname = "MARINA", @user_login = "LMARINA"

exec insert_record @fname = "TINO", @lname = "LINGUINO", @user_login = "TLINGUINO"

exec insert_record @fname = "TRACY", @lname = "MACY", @user_login = "TMACY"

exec insert_record @fname = "JUDY", @lname = "MOODY", @user_login = "JMOODY"

exec insert_record @fname = "JANICE", @lname = "MALICE", @user_login = "JMALICE"

exec insert_record @fname = "KATY", @lname = "JADY", @user_login = "KJADY"

exec insert_record @fname = "WALDO", @lname = "FALDO", @user_login = "WFALDO"

exec insert_record @fname = "YODA", @lname = "TOYOTA", @user_login = "YTOYOTA"

exec insert_record @fname = "ROGER", @lname = "LODGER", @user_login = "RLODGER"

exec insert_record @fname = "NANCY", @lname = "FANCY", @user_login = "NFANCY"

exec insert_record @fname = "TRACY", @lname = "MACY", @user_login = "TMACY"

exec insert_record @fname = "JUDY", @lname = "MOODY", @user_login = "JMOODY"

exec insert_record @fname = "JANICE", @lname = "MALICE", @user_login = "JMALICE"

exec insert_record @fname = "ANA", @lname = "BANANA", @user_login = "ABANANA"

exec insert_record @fname = "TERRY", @lname = "CHERRY", @user_login = "TCHERRY"

exec insert_record @fname = "JUDY", @lname = "MOODY", @user_login = "JMOODY"

exec insert_record @fname = "JANICE", @lname = "MALICE", @user_login = "JMALICE"

This should hide our valuable user_login. Any thoughts? And I could do as Sergiy suggested and separate the mapping itself to its own db or something.

Eric M Russell SSC Guru Points: 125255 More actions · Answer 15

Yes, you can prevent the web application from interrogating employee SSNs by implementing role based security and abstracting the application from the employee table(s) by using stored proceure(s). Ideally, you would not grant select permission on any tables and instread only grant exec permission on stored procedures. Hashing or encryption of SSN would be reccomended, if the SSN is returned back to the application from any call.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho