Picking up random rows in SQL server 2008

  • Dear all:

    I have to implement Stratified Random Sampling in my application which is a part of the DataWare house software. Here's the scenario:

    -let’s assume that we have a sample universe of 500 records for a given dealer

    -Now let’s suppose that we need a sample for analysis of 50 and we expect a response rate of 50%. On this basis we need to select 100 records to send the survey to from the 500 available

    -In order to properly represent the 500 records available to use we need to ensure that each respondent has an equal chance of being selected for inclusion in our sample of 200. We do this by:

    ocalculating a 1 in N sampling interval # - this equals the sample size required to which we will send a survey ( in this case 100) divided by the total sample universe available ( in this case 500)

    oIn our example we have a 1 in 5 sampling interval - in other words we will take every fifth record

    oIt’s important too to ensure that we choose a random start point - we therefore need to generate a random number (between 1 and 500 in our case). Whatever the number generated is , that record becomes our start point for sampling

    oFor example if we generated a random number of 491 then the 491st record would be our first selected record. From that record we’d start our sample selection using the 1 in 5 sampling interval calculated for this study so the next selected record would be 496 and so on

    oOnce the maximum record number in the universe has been reached then the count continues from the bottom up. So again in our example our third iteration would take us beyond 500 and the next selected record would be record number 1. The next after that would be record 6, etc until we have our 100 records.

    I am aware of NewID() and TableSample and ramdom column techniques, but I am a bit lost about the approach over. I am listing down the schema of the available source tables below.

    Client_Customers

    ClientCustomerID Bigint, PK, Identitiy

    Description nvarchar(max)

    InsertDateTime DateTime

    UpdateDateTime DateTime

    ClientCustomerAttributes

    ClientCustomerAttributeIDbigint, PK, Identity

    Descriptionnvarchar(max)

    ClientCustomerValues

    ClientCustomerValueIDbigintPKIdentity

    Valuenvarchar(max)

    CilentCustomerIDbigint

    ClientCustomerAttributeIDbigint

    Client_Products

    ClientProductID Bigint, PK, Identitiy

    Description nvarchar(max)

    InsertDateTime DateTime

    UpdateDateTime DateTime

    ClientProductAttributes

    ClientProductAttributeIDbigint, PK, Identity

    Descriptionnvarchar(max)

    ClientProductValues

    ClientProductValueIDbigintPKIdentity

    Valuenvarchar(max)

    CilentProductIDbigint

    ClientProductAttributeIDbigint

    Client_Transactions

    ClientTransactionID Bigint, PK, Identitiy

    Description nvarchar(max)

    ClientCustomerIDBigint FK

    ClientProductIDBigint FK

    InsertDateTime DateTime

    UpdateDateTime DateTime

    ClientTransactionAttributes

    ClientTransactionAttributeIDbigint, PK, Identity

    Descriptionnvarchar(max)

    ClientTransactionValues

    ClientTransactionValueIDbigintPKIdentity

    Valuenvarchar(max)

    CilentTransactionIDbigint

    ClientTransactionAttributeIDbigint

    regards,

    Kazim Raza

  • Is it not as simple as generating a list of numbers, starting at random 'R', then adding 'N' until enough records are selected?

    If so, write something to populate a temporary table with these key numbers, then join to the real data, using the ROW_NUMBER ranking function to assign a sequential number to each data row.

    Paul

  • Yes Paul, I know its not simple and that's the reason I have turned up here to get expert opinion to get the basic groundwork correct. As I mentioned in my post, I am seeking opinion on what approach should I adopt?

    Regards,

    Kazim Razas

  • It is reasonably simple, as I outlined in my last post.

    If it doesn't make sense yet, provide a simplified example that we can work with to outline the concept.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply