Finding skipped values

  • Hello, I have a question to ask, but before I try to write the whole question I'll give you an example of the data in one of my tables.

    In tables we generally have an ID column, it being numeric and it usually grows one by one. Like this:

    ID Data

    1   Hey

    2   Jude

    3   Hello

    Well, let's say that a user deletes row 2 and so we only have rows 1 and 3 now. If the id column is Identity the next inserted ID will be number 4. If it's not, it could be any number other than 1 and 3. What we usually do in this case is a "select max(ID) + 1" to insert the next ID.

    The problem I'm facing is that we're running out of numbers and there are gaps with valid numbers that could be used. In this example I could insert some data with an ID of 2 because it was previously deleted, but I'm having trouble how to figure out wich numbers are not being used anymore.

    To make things a bit harder there is no log on which id's have been deleted.

    Any idea on how to get those?

    I think I could do some stuff using vb to get those numbers, but it would be really slow cause I'd have to search row by row.

    Thanks in advance.

  • Depending on how large your coulumn is, you might create a table with all the numbers from 1 to your largest identity and then join the two tables, deleting the ones that match. You'll have to put a trigger on the primary table to add deleted numbers to the secondary table to keep them in sync.

    Why not just copy the data to a table with a larger identity column?


    And then again, I might be wrong ...
    David Webb

  • Similar to the answer above, create a numbers_table.  Populate it in a loop to cover the range of numbers needed.

    Do NOT delete from this numbers_table as it can be usful in solving other problems.

    To find the lowest available number join the two tables:

    Select min(number)

    From numbers_table

    left Join YourTable

     On YourTable.id = Numbers_table.number

    Where YourTable.Id IS NULL

    Ron

    p.s. don't forget to create index on numbers_table.

  • Running out of numbers?  It can't be because of SQL... BIGINT is H-U-G-E!  You could just change the datatype of the column and, bingo, more numbers.

    If the restriction is because of some user requirement (like a limit of 6 digits or something similar), then the idea of comparing against a numbers table is good, but I rather like the idea of copying it into another table. 

    The only problem with both the numbers table and the copy method is... if you have any other tables that rely on the number (whatever the column is), you could really screw up your database.  My best recommendation is to change the datatype to a larger datatype.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • You could also try the following (which doesn't use a numbers table). You should probably have a (preferably clustered) index on your id column.

     

     

    set nocount on

    go

    create table testtable (id int)

    go

    insert testtable select 1

    insert testtable select 2

    insert testtable select 3

    insert testtable select 5

    insert testtable select 7

    insert testtable select 8

    go

    -- All available ids

    select t1.id + 1

    from testtable t1 left join testtable t2 on t2.id = t1.id + 1

    where t2.id is null

    -- Smallest available id

    select min(t1.id) + 1

    from testtable t1 left join testtable t2 on t2.id = t1.id + 1

    where t2.id is null

    drop table testtable

    go

  • CREATE TABLE t

    (

     k1 INT NOT NULL

     , c1 CHAR NOT NULL

     CONSTRAINT pk_t PRIMARY KEY(k1)

    )

    GO

    CREATE FUNCTION dbo.CloseMyGaps() RETURNS INT

    AS

    BEGIN

     RETURN

      CASE

       WHEN EXISTS

            (SELECT *

               FROM t

              WHERE k1 = 1)

       THEN (SELECT MIN(t1.k1) + 1

               FROM t t1

                LEFT OUTER JOIN t t2

                ON t1.k1 = t2.k1 - 1

              WHERE t2.k1 IS NULL)

       ELSE 1

      END

    END

    GO

    ALTER TABLE t ADD CONSTRAINT d_k1 DEFAULT dbo.CloseMyGaps() FOR k1

    GO

    INSERT INTO t(c1) VALUES('a')

    INSERT INTO t(c1) VALUES('b')

    INSERT INTO t(c1) VALUES('c')

    INSERT INTO t(c1) VALUES('d')

    SELECT *

      FROM t

    DELETE FROM t WHERE k1 IN (2,3)

    INSERT INTO t(c1) VALUES('d')

    INSERT INTO t(c1) VALUES('d')

    DELETE FROM t WHERE k1 =1

    INSERT INTO t(c1) VALUES('f')

    SELECT *

      FROM t

    Not sure what the performance will look like on a large table and why it is important for you anyway.

    If you're "running out of numbers", change your underlying data type to BIGINT or even DECIMAL(38,0). That should last for some time.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Why not just copy the data to a table with a larger identity column?

    Because the ID's are actually real codes and they have a max length. I think I'm gonna go with the loop suggestion. I had thought of the numbers table too, maybe that's the fastest solution.

  • Jesper be careful, your first statement does NOT return ALL available numbers.

    Set NOCOUNT ON

    create table testtable (id int)

    go

    insert testtable select 1

    insert testtable select 2

    insert testtable select 3

    --insert testtable select 5

    insert testtable select 7

    insert testtable select 8

    go

    -- All available ids

    select t1.id + 1

    from testtable t1 left join testtable t2 on t2.id = t1.id + 1

    where t2.id is null

    drop table testtable

    go

    -----------

    4

    9

    Only the first is a series of missing numbers will be returned.  Note that "5" is missing but not returned.

    ron

     

  • Oups, you're right. But the min should be correct....

  • With all the good solutions, I feel a little silly posting this but it's nasty fast... and, since a lot of folks keep their Tally (numbers) tables down to something less that 10k, this is a pretty cool alternate method although it does use the premise of a Tally table... in fact, it custom builds a tempory Tally table on the fly...

    --===== Declare and populate the max ID we have in the table

    DECLARE @MyMax INT

     SELECT @MyMax = MAX(CustID)

       FROM dbo.Customer WITH (NOLOCK)

      WHERE CustID < 1000000  --I forced a limit to the first

                              --million in my 600 million row table

    --===== Declare the maximum number of rows to create in the tally table.

         -- It's kinda like the programmable "TOP" statement available in SS 2005

        SET ROWCOUNT @MyMax

        SET NOCOUNT ON -- This is just for a little extra speed

    --===== If the temporary tally table exists, drop it

         IF OBJECT_ID('TempDB..#TempTally') IS NOT NULL

            DROP TABLE #TempTally

    --===== Create and populate the temporary tally table

         -- The cross-joins are, in fact, intentional and should produce a

         -- minimum capability of 256 x 256 x 256 x 256 which is twice as big as INT.

         -- Of course, the number of rows this spawns is limited by SET ROWCOUNT above.

     SELECT IDENTITY(INT,1,1) AS N

       INTO #TempTally

       FROM dbo.SysColumns sc1,

            dbo.SysColumns sc2,

            dbo.SysColumns sc3,

            dbo.SysColumns sc4

    --===== Add an index to the tally table for speed

      ALTER TABLE #TempTally

            ADD PRIMARY KEY CLUSTERED (N)

    --===== Set the number of rows to populate back to normal

        SET ROWCOUNT 0

    --===== Find the missing IDs in the Customer table

     SELECT t.N AS CustID

       FROM dbo.Customer c WITH (NOLOCK)

      RIGHT OUTER JOIN #TempTally t

            ON c.CustID = t.N

      WHERE c.CustID IS NULL

    --===== If the temporary tally table exists, drop it

         IF OBJECT_ID('TempDB..#TempTally') IS NOT NULL

            DROP TABLE #TempTally

    First, I gotta say (as both Frank and I previously pointed out), I just can't imagine running out of numbers in the face of datatypes like BIGINT or DECIMAL(38,0).  Only a 3rd party requirement limiting the number of digits would be a reason.

    The code above is an example I wrote for this post against a real database with more than 600 million records.  The first Million records (CustID wise) are pretty well screwed up on this table.  I set it up to look at CustIDs with 6 or fewer digits (999,999 max or < 1,000,000).

    Like I said, I'm missing lot's of CustID's in this first million rows (almost 700,000).  That non-with-standing and with the grid mode on, the whole shootin' match for the above code is over in about 6 seconds and I'm thinking that, given the conditions in my table, that's the worst case you'll ever see.

    Off course, you could divert the output of this run into another table to "reserve" ID's for another process.

    I still don't understand why you think you need to reuse ID's though... and, I gotta tell you this, using @max-2(ID)+1 to get the next ID is death by SQL.  I just went through a bunch of THAT hooey with a 3rd party vendor.  If your not going to use IDENTITY to automatically create the ID's, you need to give some serious thought about creating a SEQUENCE table... AND, since I just went through that garbage with another 3rd party vendor, there's only one way to write the updates to the SEQUENCE table that will prevent deadlocks.  If you need to go that way, write back...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • 2 Datne Garcia

    Do you wipe out all records related to deleted instance before creating a new record with the same ID?

    Otherwise your historical data about deleted, say, customer will be automatically attached to the new record about another customer.

    _____________
    Code for TallyGenerator

  • Do you wipe out all records related to deleted instance before creating a new record with the same ID?

    Yes, I do. It wouldn't let you otherwise.

    IRT the ones telling me not to reuse numbers:

    I work for a telephone co. We need all those numbers cause they're codes we can still use. The program was already working like that when I arrived. It wasn't my idea to use the code as the ID. It isn't that bad though. 

  • That's nice telephone company!

    Can you post its name? I gonna use it.

    Imagine: you've made a lot of inernational calls, but you don't need to pay their bills. They will disconnect you because of huge debts, but after month you can join again - and you are clear!!!

    Because your ID is taken by somebody else and for this reason information about you is wiped out from the system!

    Are they still in business?

    _____________
    Code for TallyGenerator

  • Really sorry but don't have time to write a long post or sort out the code for you but here is the idea if it helps:

    Create a trigger on your table that populates an audit table (that you will need to create) to hold a record of rows that are deleted. As these records can hold the ID field, you can always reference this (audit) table when you update your table. See BOL for info on temp tables inserted and deleted.

  • IRT Sergiy:

    It's not the customer related info that gets deleted. It's a unique password to be used in the telephone switch.

    You shouldn't be so sarcastic if you don't know $#!7 about it.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply