Advanced Query Question

  • Hi,

    I have taken over the role of managing a database and there are some queries being executed that I need to understand before I make any changes.

    Could someone with some experience take a look at this query and confirm what it actually does?

    What I want it to do is to check the data in the wce_contact table against the data in the wce_ilr table and where a stu_id that does not exist in the contact table but does in the wce_ilr table, I want to copy those rows over to the contact table. It’s a duplication checker only new stu_id’s to be moved over.

    I’m not sure that this query does that correctly or not. Thanks for looking.

    with cteDelegates(stu_id, Student_Forename, middlename, Surname, Date_of_Birth, Sex, NI_Number, Learning_Difficulty, Learning_Diff, Disability, Ethnicity, Student_Address_1, Student_Address_2, Student_Address_3, Student_Address_4,

    Student_Postcode, Telephone, Email_Address, Mobile_Number, STEN_Employer_Identifier, Date_Created, rown) as(Select wce_ilr.stu_id, wce_ilr.Student_Forename, wce_ilr.middlename, wce_ilr.Surname, wce_ilr.Date_of_Birth, wce_ilr.Sex, wce_ilr.NI_Number, wce_ilr.Learning_Difficulty, wce_ilr.Learning_Diff, wce_ilr.Disability, wce_ilr.Ethnicity, wce_ilr.Student_Address_1, wce_ilr.Student_Address_2, wce_ilr.Student_Address_3, wce_ilr.Student_Address_4,

    wce_ilr.Student_Postcode, wce_ilr.Telephone, wce_ilr.Email_Address, wce_ilr.Mobile_Number, wce_ilr.STEN_Employer_Identifier,wce_ilr.Date_Created, ROW_NUMBER() OVER (PARTITION BY wce_ilr.stu_id ORDER BY wce_ilr.stu_id) from wce_ilr left join wce_contact

    on WCE_ILR.stu_id =wce_contact.stu_id

    where WCE_contact.stu_id is null)

    INSERT INTO wce_contact (UNIQUEID,contact,firstname,middle_name,lastname,stu_id,dob,

    gender,national_insurance_no,Learning_Difficulties,Learning_Diff,Disability,ethnicity,

    address1,address2,address3,city,postalcode,phone,emailaddress,mobilephone,STEN_Employer_Identifier,record_type, Date_Created)

    Select stu_id, Student_Forename + ' ' + Surname, Student_Forename, middlename,

    Surname, stu_id, Date_of_Birth, Sex, NI_Number, Learning_Difficulty, Learning_Diff, Disability, Ethnicity, Student_Address_1, Student_Address_2, Student_Address_3, Student_Address_4, Student_Postcode, Telephone, Email_Address, Mobile_Number, STEN_Employer_Identifier, 'Learner', Date_Created from cteDelegates where rown = 1 order by Student_Forename

  • It is, but it could be a lot simpler:-

    INSERT INTO wce_contact (UNIQUEID,contact,firstname,middle_name,lastname,stu_id,dob,

    gender,national_insurance_no,Learning_Difficulties,Learning_Diff,Disability,ethnicity,

    address1,address2,address3,city,postalcode,phone,emailaddress,mobilephone,

    STEN_Employer_Identifier,record_type, Date_Created)

    SELECT S.stu_id, S.Student_Forename + ' ' + S.Surname, S.Student_Forename, S.middlename,

    S.Surname, S.stu_id, S.Date_of_Birth, S.Sex, S.NI_Number, S.Learning_Difficulty, S.Learning_Diff, S.Disability, S.Ethnicity, S.Student_Address_1, S.Student_Address_2, S.Student_Address_3, S.Student_Address_4, S.Student_Postcode, S.Telephone, S.Email_Address, S.Mobile_Number, S.STEN_Employer_Identifier, 'Learner', S.Date_Created

    FROM wce_ilr AS S LEFT OUTER JOIN wce_contact AS C on S.stu_id = C.stu_id

    WHERE C.stu_id IS NULL

    The combination of the LEFT OUTER JOIN and the IS NULL in the SELECT query selects all the records from wce_ilr that are not already present.

  • Thank you for your reply. I ran the select part of your script and it looks good but when i run the select on the script i posted the results are slightly different and before i go live with yours i would like to understand why. Thanks for looking.

    This is my current script and it pulls 128 rows from the wce_ilr table where it can't find them in the wce_contact table.

    with cteDelegates(stu_id, Student_Forename, middlename, Surname, Date_of_Birth, Sex, NI_Number, Learning_Difficulty, Learning_Diff, Disability, Ethnicity, Student_Address_1, Student_Address_2, Student_Address_3, Student_Address_4,

    Student_Postcode, Telephone, Email_Address, Mobile_Number, STEN_Employer_Identifier, Date_Created, rown) as(Select wce_ilr.stu_id, wce_ilr.Student_Forename, wce_ilr.middlename, wce_ilr.Surname, wce_ilr.Date_of_Birth, wce_ilr.Sex, wce_ilr.NI_Number, wce_ilr.Learning_Difficulty, wce_ilr.Learning_Diff, wce_ilr.Disability, wce_ilr.Ethnicity, wce_ilr.Student_Address_1, wce_ilr.Student_Address_2, wce_ilr.Student_Address_3, wce_ilr.Student_Address_4,

    wce_ilr.Student_Postcode, wce_ilr.Telephone, wce_ilr.Email_Address, wce_ilr.Mobile_Number, wce_ilr.STEN_Employer_Identifier,wce_ilr.Date_Created, ROW_NUMBER() OVER (PARTITION BY wce_ilr.stu_id ORDER BY wce_ilr.stu_id) from wce_ilr left join wce_contact

    on WCE_ILR.stu_id =wce_contact.stu_id

    where WCE_contact.stu_id is null)

    Select stu_id, Student_Forename + ' ' + Surname, Student_Forename, middlename,

    Surname, stu_id, Date_of_Birth, Sex, NI_Number, Learning_Difficulty, Learning_Diff,

    Disability, Ethnicity, Student_Address_1, Student_Address_2, Student_Address_3,

    Student_Address_4, Student_Postcode, Telephone, Email_Address, Mobile_Number,

    STEN_Employer_Identifier, 'Learner', Date_Created from cteDelegates

    where rown = 1

    This is your script and it pulls 154 rows from the wce_ilr table where it can't find them in the wce_contact table.

    SELECT S.stu_id, S.Student_Forename + ' ' + S.Surname, S.Student_Forename, S.middlename,

    S.Surname, S.stu_id, S.Date_of_Birth, S.Sex, S.NI_Number, S.Learning_Difficulty, S.Learning_Diff, S.Disability, S.Ethnicity, S.Student_Address_1, S.Student_Address_2, S.Student_Address_3, S.Student_Address_4, S.Student_Postcode, S.Telephone, S.Email_Address, S.Mobile_Number, S.STEN_Employer_Identifier, 'Learner', S.Date_Created

    FROM wce_ilr AS S LEFT OUTER JOIN wce_contact AS C on S.stu_id = C.stu_id

    WHERE C.stu_id IS NULL

  • I have just been checking the data and can see the query provided by malcolm.o-nions has duplicate rows, i think this is because the wce_ilr table can have multiple rows per stu_id so therefore this query will not do what i need.

    Anyone have any thoughts on this?

  • The order by in the window function in the CTE in your original query will return a random row from the available dupes.

    Is this intentional?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Well this does not matter in this instance i know they are unique based on their stu_id and i just want their contact deatils to populate the wce_contact tables.

    So the answer is my existing query is the best for achieveing this?

  • sc-w (8/11/2010)


    Well this does not matter in this instance i know they are unique based on their stu_id and i just want their contact deatils to populate the wce_contact tables.

    So the answer is my existing query is the best for achieveing this?

    It's good enough and should give the correct results. The method is standard and straightforward. You might get significantly better performance from using a temp table instead of a CTE but that would depend upon the number of rows in the source table and how many are pulled on each pass of the query.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for this.

    Could you do me a favour? I'm not getting my head round how this works.

    The part of my query below is the bit that works out if there are any duplicate records right?

    Now please correct me when i go wrong, ROW_NUMBER() generates a unique number in increments of 1 for each stu_id it finds in the wce_ilr table, then PARTITION BY tells it to stop when it finds the first unique stu_id, then we are LEFT JOINING the wce_contact table to check the PARTIONIONED records against the wce_contact table on the wce_ilr.stu_id = wce_contact.stu_id

    Up to this point if my assumption above is correct I’m happy and please let me know if my assumption is wrong so i learn.

    Now here is where I’m not getting it, the WHERE wce_contact.stu_id IS NULL surely only checks for NULLS in the wce_contact.stu_id field. I would have thought you would need something like

    WHERE NOT wce_ilr.stu_id = wce_contact.stu_id

    Why the NULL? Look forward to getting this one clear in my head.

    , ROW_NUMBER() OVER (PARTITION BY wce_ilr.stu_id ORDER BY wce_ilr.stu_id) from wce_ilr left join wce_contact

    on WCE_ILR.stu_id =wce_contact.stu_id

    where WCE_contact.stu_id is null)

  • sc-w (8/11/2010)


    Thanks for this.

    Could you do me a favour? I'm not getting my head round how this works.

    The part of my query below is the bit that works out if there are any duplicate records right?

    Now please correct me when i go wrong, ROW_NUMBER() generates a unique number in increments of 1 for each stu_id it finds in the wce_ilr table, then PARTITION BY tells it to stop when it finds the first unique stu_id, then we are LEFT JOINING the wce_contact table to check the PARTIONIONED records against the wce_contact table on the wce_ilr.stu_id = wce_contact.stu_id

    Up to this point if my assumption above is correct I’m happy and please let me know if my assumption is wrong so i learn.

    Now here is where I’m not getting it, the WHERE wce_contact.stu_id IS NULL surely only checks for NULLS in the wce_contact.stu_id field. I would have thought you would need something like

    WHERE NOT wce_ilr.stu_id = wce_contact.stu_id

    Why the NULL? Look forward to getting this one clear in my head.

    , ROW_NUMBER() OVER (PARTITION BY wce_ilr.stu_id ORDER BY wce_ilr.stu_id) from wce_ilr left join wce_contact

    on WCE_ILR.stu_id =wce_contact.stu_id

    where WCE_contact.stu_id is null)

    PARTITION BY restarts at 1 when wce_ilr.stu_id changes, so if say there are three rows with wce_ilr.stu_i = 257, then the [rown] of the three rows will be 1, 2 and 3. So picking rows with [rown] of only 1 will ensure that your rows have unique values of stu_id.

    WHERE NOT wce_ilr.stu_id = wce_contact.stu_id

    - for wce_ilr.stu_i = 257, this would attempt to match (i.e. output) every row in wce_contact except wce_contact.stu_id = 257, and so on. A cartesian product, more or less.

    WHERE wce_contact.stu_id IS NULL

    Get rows from wce_ilr where there's no matching row (on stu_id) in wce_contact.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Great feed back thank you.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply