Help with Counts

  • stephen99999 (1/12/2012)


    in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.

    Just confused of your requirements.

    Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email

    ----- -------------------- -------------------- ------------------- -----------------

    9 6 6 5 3

  • bicky1980 (1/12/2012)


    stephen99999 (1/12/2012)


    in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.

    Just confused of your requirements.

    Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email

    ----- -------------------- -------------------- ------------------- -----------------

    9 6 6 5 3

    Isn't that what my query posted earlier returns?

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (1/12/2012)


    bicky1980 (1/12/2012)


    stephen99999 (1/12/2012)


    in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.

    Just confused of your requirements.

    Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email

    ----- -------------------- -------------------- ------------------- -----------------

    9 6 6 5 3

    Isn't that what my query posted earlier returns?

    wondering the same. Being a Jr. dev myself, I really want to work this out. Now I am addicted to this post lol.

    Stephen

  • stephen99999 (1/12/2012)


    Mark-101232 (1/12/2012)


    bicky1980 (1/12/2012)


    stephen99999 (1/12/2012)


    in table format, bicky, please post what you are expecting your results to look like. Earlier you stated you wanted to see 9, 6, etc to be returned, and now you say it is not.

    Just confused of your requirements.

    Total Total Unqiue Datakey Total Unique Landline Total Unique Mobile Total Unique Email

    ----- -------------------- -------------------- ------------------- -----------------

    9 6 6 5 3

    Isn't that what my query posted earlier returns?

    wondering the same. Being a Jr. dev myself, I really want to work this out. Now I am addicted to this post lol.

    Stephen

    Yes the results where correct, but dont see how the logic works, doesnt appear to group the results by datakey - If you could explain how your query ensures the datakey and variable (landline, mobile or email) are unique then thats great, but I dont see it...

  • I see your point now. In all actuality, your prior code:

    select count(distinct datakey) as Unique_Dataset

    from (select *, row_number() over(partition by datakey order by case when Datakey!='' then 0 else 1 end) as pref

    from test where Datakey!='') z

    where pref = 1

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

    from test where landline!='') z

    where pref = 1

    select count(distinct mobile) as Unique_Mobiles

    from (select *, row_number() over(partition by datakey order by case when mobile!='' then 0 else 1 end) as pref

    from test where mobile!='') z

    where pref = 1

    select count(distinct email) as Unique_Emails

    from (select *, row_number() over(partition by datakey order by case when email!='' then 0 else 1 end) as pref

    from test where email!='') z

    where pref = 1

    does what you are looking for. To put it all into one statement just set each select statement to an INT variable and then select them, ie., SELECT @a Total, @b-2 UniqueDataKeys, @C UniqueLandLines, etc

    My question is, how did you figure out to do it this way? I have not used row_number(), over, partition by very much and really don't understand it. Can anyone elaborate? Haven't found a good blog that explains this in "idiot terms."

  • stephen99999 (1/12/2012)


    I see your point now. In all actuality, your prior code:

    select count(distinct datakey) as Unique_Dataset

    from (select *, row_number() over(partition by datakey order by case when Datakey!='' then 0 else 1 end) as pref

    from test where Datakey!='') z

    where pref = 1

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

    from test where landline!='') z

    where pref = 1

    select count(distinct mobile) as Unique_Mobiles

    from (select *, row_number() over(partition by datakey order by case when mobile!='' then 0 else 1 end) as pref

    from test where mobile!='') z

    where pref = 1

    select count(distinct email) as Unique_Emails

    from (select *, row_number() over(partition by datakey order by case when email!='' then 0 else 1 end) as pref

    from test where email!='') z

    where pref = 1

    does what you are looking for. To put it all into one statement just set each select statement to an INT variable and then select them, ie., SELECT @a Total, @b-2 UniqueDataKeys, @C UniqueLandLines, etc

    My question is, how did you figure out to do it this way? I have not used row_number(), over, partition by very much and really don't understand it. Can anyone elaborate? Haven't found a good blog that explains this in "idiot terms."

    I'm not convinced these queries work at all even though they apparently give desired results.

    This one

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

    from test where landline!='') z

    where pref = 1

    appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.

    Adding unique ordering changes the results

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref

    from test where landline!='') z

    where pref = 1

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref

    from test where landline!='') z

    where pref = 1

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (1/13/2012)


    stephen99999 (1/12/2012)


    I see your point now. In all actuality, your prior code:

    select count(distinct datakey) as Unique_Dataset

    from (select *, row_number() over(partition by datakey order by case when Datakey!='' then 0 else 1 end) as pref

    from test where Datakey!='') z

    where pref = 1

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

    from test where landline!='') z

    where pref = 1

    select count(distinct mobile) as Unique_Mobiles

    from (select *, row_number() over(partition by datakey order by case when mobile!='' then 0 else 1 end) as pref

    from test where mobile!='') z

    where pref = 1

    select count(distinct email) as Unique_Emails

    from (select *, row_number() over(partition by datakey order by case when email!='' then 0 else 1 end) as pref

    from test where email!='') z

    where pref = 1

    does what you are looking for. To put it all into one statement just set each select statement to an INT variable and then select them, ie., SELECT @a Total, @b-2 UniqueDataKeys, @C UniqueLandLines, etc

    My question is, how did you figure out to do it this way? I have not used row_number(), over, partition by very much and really don't understand it. Can anyone elaborate? Haven't found a good blog that explains this in "idiot terms."

    I'm not convinced these queries work at all even though they apparently give desired results.

    This one

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

    from test where landline!='') z

    where pref = 1

    appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.

    Adding unique ordering changes the results

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref

    from test where landline!='') z

    where pref = 1

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref

    from test where landline!='') z

    where pref = 1

    I was thinking the case would prioritise any populaed landline field over the empty ones and then perform a distinct count of them. I'm not sure but with the order by indkey, does that change the order of the landline priority?

  • <snip>

    I'm not convinced these queries work at all even though they apparently give desired results.

    This one

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

    from test where landline!='') z

    where pref = 1

    appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.

    Adding unique ordering changes the results

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref

    from test where landline!='') z

    where pref = 1

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref

    from test where landline!='') z

    where pref = 1

    I was thinking the case would prioritise any populaed landline field over the empty ones and then perform a distinct count of them. I'm not sure but with the order by indkey, does that change the order of the landline priority?

    The CASE expression does prioritise the populated ones before the unpopuated, but you're excluding the unpopulated ones anyway in the WHERE clause. Effectively

    select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

    from test where landline!=''

    is

    select *, row_number() over(partition by datakey order by 0) as pref

    from test where landline!=''

    In other words, the ordering is undefined. But I still don't understand why you want to take just one landline value from each datakey group.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (1/13/2012)


    <snip>

    I'm not convinced these queries work at all even though they apparently give desired results.

    This one

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

    from test where landline!='') z

    where pref = 1

    appears to pick a random landline from each datakey group based on an indeterminate ordering and then counts how many distinct items are found.

    Adding unique ordering changes the results

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY ASC) as pref

    from test where landline!='') z

    where pref = 1

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end, INDKEY DESC) as pref

    from test where landline!='') z

    where pref = 1

    I was thinking the case would prioritise any populaed landline field over the empty ones and then perform a distinct count of them. I'm not sure but with the order by indkey, does that change the order of the landline priority?

    The CASE expression does prioritise the populated ones before the unpopuated, but you're excluding the unpopulated ones anyway in the WHERE clause. Effectively

    select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end) as pref

    from test where landline!=''

    is

    select *, row_number() over(partition by datakey order by 0) as pref

    from test where landline!=''

    In other words, the ordering is undefined. But I still don't understand why you want to take just one landline value from each datakey group.

    I want to take one landline value as I can only count one landline against each unique dataset

  • I think the best way to do this is to just test on various sample data. Really change it up, and see if you can "break" the code. If not, then I guess it works (even if we don't fully understand it).

    Stephen

  • Sorry, I still don't get it. Why only one landline value per datakey, surely you need to consider all landline values to count the number of distinct ones.

    Here's a simple example, what result are you expecting?

    truncate table test

    insert into test

    values ('1','0001','1111111111','','')

    insert into test

    values ('2','0001','9999999999','','')

    insert into test

    values ('3','0002','1111111111','','')

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end,indkey asc) as pref

    from test where landline!='') z

    where pref = 1

    select count(distinct landline) as Unique_landlines

    from (select *, row_number() over(partition by datakey order by case when landline!='' then 0 else 1 end,indkey desc) as pref

    from test where landline!='') z

    where pref = 1

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • OK, Yes you are right my query does not work correctly...

    I have edited the data slightly using the following:

    create table test2(

    indkey nvarchar(2),

    datakey nvarchar(4),

    landline nvarchar(11),

    mobile nvarchar(11),

    email nvarchar(20))

    insert into test2

    values ('1','0001','01234567890','0712345679','1@test.co.uk')

    insert into test2

    values ('2','0001','01234567890','','1@test.co.uk')

    insert into test2

    values ('3','0002','01234567890','','2@test.co.uk')

    insert into test2

    values ('4','0002','01234567891','','2@test.co.uk')

    insert into test2

    values ('5','0002','','07123456789','')

    insert into test2

    values ('6','0003','01234567892','07123456791','')

    insert into test2

    values ('7','0004','01234567893','07123456792','')

    insert into test2

    values ('8','0005','01234567894','07123456793','2@test.co.uk')

    insert into test2

    values ('9','0008','01234567895','07123456793','9@test.co.uk')

    Now my results are different, what the row_number is doing is selecting a preference that populated landline values are prioritised over empty values. 🙁 What I need the query to do is preference a populated value which isnt returned for another datakey...Probablu not making sense so I will try to demonstrate:

    Run the following query on the test2 data:

    select *, row_number() over(partition by datakey order by case when landline!=''

    then 0 else 1 end) as pref

    from test2

    As you can see the preference has been assigned to populated value over empty values, but where there are 2 landline values, the query has just selected them in the default order of the file. So using these results we would be using the following:

    select * from (

    select *, row_number() over(partition by datakey order by case when landline!=''

    then 0 else 1 end) as pref

    from test2) z where pref=1

    As you can see this has left us a duplicate landline value for datakey 0001 & 0002, when really I needed the query to see that the number 01234567890 has already been used for datakey 0001 so then use the next populated landline value (as long as that also does not exist in any previous records)

    Does this make sense?

  • Hi Bicky

    Some very experienced and talented folks are having real difficulty figuring out exactly what you want here, and it looks to me that you might still be figuring it out yourself. Can I suggest you take a step back and review your requirement, then return with a more complete description and consolidate that with a representative sample data set?

    I suspect the solving query won't be complicated at all when it's done. It's not the answer which is causing a problem for folks, it's the question.

    Cheers

    ChrisM


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • stephen99999 (1/13/2012)


    I think the best way to do this is to just test on various sample data. Really change it up, and see if you can "break" the code. If not, then I guess it works (even if we don't fully understand it).

    Stephen

    That's a pretty dangerous approach - trusting code you don't understand to be correct after testing on a set of small samples. If this code doesn't work, how will you ever know it doesn't work, if the data combination that breaks it is so complex that it can only arise with a sample of at least 5000 rows?

    Tom

  • ChrisM@home (1/13/2012)


    Hi Bicky

    Some very experienced and talented folks are having real difficulty figuring out exactly what you want here, and it looks to me that you might still be figuring it out yourself. Can I suggest you take a step back and review your requirement, then return with a more complete description and consolidate that with a representative sample data set?

    I suspect the solving query won't be complicated at all when it's done. It's not the answer which is causing a problem for folks, it's the question.

    Cheers

    ChrisM

    I'm am not trying to figure it out myself I know exactly what I want, and what the constraints are.

    I willl try to clarify exactly what I need:

    I have a table which contains 40 million records. I need to provided population statistics on the following (I will put in brackets what each of the constraints are)

    Total number of records (No Constraints)

    Total number of Unique Datakey (Needs to be unique)

    Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)

    Total number of populated Mobiles (Needs to have a unique datakey as well as be unique itself - If the same mobile is anywhere else in the results its needs to be excluded from the figure)

    Total number of populated Emails (Needs to have a unique datakey as well as be unique itself - If the same email is anywhere else in the results its needs to be excluded from the figure)

    Now this makes sense to me, if people need any more clarification, please ask, and I will try to clarify my requirements more.

    Please use the code creating test2 previous created to test any queries

    Thanks All

Viewing 15 posts - 31 through 45 (of 120 total)

You must be logged in to reply to this topic. Login to reply