Dictionary Lookup

  • Hi

    I have hundred million ID, NAME pairs stored in SQL Server as a table that is accessed by C# code, which has a responsibility to lookup NAMEs corresponding to million IDs at a time.    C# code has these million IDs stored in an array, before invoking SQL module.

    Columns of the SQL table:

    ID VARCHAR (500)  -- this is indexed on

    NAME VARCHAR (500)

    Could you please share your thoughts on what is the most performant method to do these million lookups?

    The approach I am thinking of, is - through C# sqlDataClient feature, create a #temp table in SQL consisting of million IDs ( the "input")  Then, JOIN the temp table with these million inputs to main table to create a result temp table.  And, fetch this final result table holding NAMEs (output) corresponding to the IDs passed (input) out of SQL into C#,  by appending NAMES into an array of results.

    Is there a more performant approach?  Alternatively, does SQL Server support Dictionary Lookup sort of built-in feature, to avoid JOINs?

    thank you

  • Good idea would be to leave "the lookup" to the SQL server, then pass the results to C# code. Something like:

    SELECT ID, NAme FROM DirectoryTable 
    WHERE ID = '12345' -- you can supply any value that works for you

    The problem is what i just typed works for somebody who sits in front of SSMS (SQL editor). The real thing would be to write a stored procedure, or a user define function with parameter(s). Then C# must provide code to pass given ID to the procedure/function, and to catch the response of the procedure/function. That way, most of the work is done by SQL Server, and C# code just communicates with SQL, receives results and displays them for the user to see.

    C# is not my area of expertise, someone else will have to help you with that part. However, SQL part is "my area of expertise" and I can  help with that part. However, you need to provide table design and some test data before that. You know, CREATE TABLE... then INSERT INTO...

    Cheers

    Zidar

    Zidar's Theorem: The best code is no code at all...

  • Do you really have 500-character IDs?  Such a long ID seems unusual and non-ideal.  At 500 bytes, only a few rows will fit in an 8K database page. If so, at least make sure these don't get treated as nvarchar -- type conversions and/or storage could further exacerbate the problem.

    A bigint can hold 18 quintillion different values, and the number of possible 128-bit guids (32 characters) is almost incomprehensibly larger. A compact indexable ID is going to be your friend here.

    Are these ID/Name pairs semi-static/slowly-changing?  i.e., assuming this is a repeated process and not one-time-only, would you query the same sets of IDs each time (allowing for additions/deletions)? Or do they vary from run-to-run? If they are consistent, you might be able to add something like a BatchID to facilitate processing w/o having to push the ID's the application is looking for back to SQL and then SQL Server having to join to a temp table to locate those IDs.

     

     

     

     

     

  • (1) Cluster the main table on ID: a nonclus index is worthless here.

    (2) Cluster the lookup table on ID.  The join of the two tables should now become a MERGE join, which will be most efficient here.

    (3) Evaluate page-compressing the main table.  That is, exec sys.sp_estimate_data_compression_savings ..., 'PAGE', and if it yields good results, then page compress the main table.

     

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • You don't need to create a temp table and join to it. You can have the query in a stored procedure that has a parameter which is a "user defined table type". The C# code can populate this parameter from the array of values you want to query. The stored procedure would do a join between this table type parameter and the SQL table that you use for lookups.

    In SQL;

    -- Create the user-defined table type.
    create type [dbo].[udt_MyTableType] as table (
    [ID] [varchar](500) not null
    )

    -- Create the stored procedure
    create procedure [dbo].[MyProcedure] @InputTable dbo.udt_MyTableType readonly
    as
    begin
    begin try
    set nocount on;

    select lt.[ID], lt.[Value]
    from LookupTable lt
    inner join @InputTable i on i.[ID] = lt.[ID]

    return 0;
    end try

    begin catch
    declare @ErrStr varchar(5000), @ErrMsg varchar(4000), @ErrSeverity int, @ErrNo int;
    select @ErrMsg = error_message(), @ErrSeverity = error_severity(), @ErrNo = error_number();
    set @ErrStr = error_procedure()
    + '. Error = ' + rtrim(ltrim(str(isnull(@ErrNo, 0), 10, 0)))
    + ': ' + isnull(@ErrMsg, 'Msg is null');
    print @ErrStr;
    raiserror(@ErrStr, @ErrSeverity, 1);
    return -1;
    end catch
    end

    In C#;

    DataTable lookupTable = new DataTable();
    lookupTable.Columns.Add("ID", typeof(string));

    // Loop through your list of values doing the following

    DataRow lookupRow = lookupTable.NewRow();
    lookupRow["ID"] = currentID;
    lookupTable.Rows.Add(lookupRow);

    // End loop

    var sqlConnection = new SqlConnection(<your connection details here>)
    var sqlCmd = new SqlCommand();
    sqlConnection.Open();
    SqlTransaction tran = sqlConnection.BeginTransaction(IsolationLevel.ReadCommitted);
    sqlCmd.Connection = sqlConnection;
    sqlCmd.CommandType = CommandType.StoredProcedure;
    sqlCmd.CommandText = "MyProcedure";
    sqlCmd.Parameters.Add(new SqlParameter("@InputTable", lookupTable));
    sqlCmd.Transaction = tran;
    SqlDataAdapter adapter = new SqlDataAdapter();
    adapter.SelectCommand = sqlCmd;
    adapter.Fill(<locale information>);
    tran.Commit();

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply