Help with Counts

  • Hi Chris

    I have run the query on a sample of 100,000 records, in my results there are no duplicate telephone numbers and no duplicate datakeys, which is great! (Ran fast too)

    Just need to incorporate the other fields into the query

    Thanks

  • The Query does however, appear to exclude some records, which I cant explain.

    I had 4 records with the same datakey and same landline, but these results were excluded from the final table?

    Any ideas why?

    Thanks

  • bicky1980 (1/17/2012)


    The Query does however, appear to exclude some records, which I cant explain.

    I had 4 records with the same datakey and same landline, but these results were excluded from the final table?

    Any ideas why?

    Thanks

    Run a select against your table, with a where clause like the following:

    SELECT...

    FROM ...

    WHERE datakey = datakeyvalue OR landline = landlinevalue


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • The boxes and balls cannot be applied/used until we know the order of operations. Maybe I missed it in this long thread :), so I'm sorry if I did... but let's take landlines first. Say my data for datakey and landline is this:

    0014, 1234567890

    0014, 1234567891

    0015, 1234567890

    0015, 1234567891

    How many unique landlines is this according to what you are wanting; 1 or 2? Again, I apologize if this has been answered already, but some of the posts are pretty long individually, so I want a summation. Also, when you tell me the number, can you explain exactly why. Please don't use the word "unique" as it is non-descriptive the way you have been using it. Say something like "this only counts as 1 because we first see that a datakey has more than one landline, but that only counts as 1 because it is the same data key. The second datakey has landlines that have been seen previously in the set, so they do not count at all."

    Note the word "previously"... That is important here because if we discard all duplicates as part of the set, this would be 0.

    Jared
    CE - Microsoft

  • The datakey & landline doesnt exist in my final table

    I edited the query at the end to put the results into a new table

    SELECT *

    into lvl

    FROM Level1

    UNION ALL

    SELECT * FROM Level2

    ORDER BY datakey, telno

    Was this correct?

  • bicky1980 (1/17/2012)


    The datakey & landline doesnt exist in my final table

    That's going to happen - can you post the results of the query I posted last?

    I edited the query at the end to put the results into a new table

    SELECT *

    into lvl

    FROM Level1

    UNION ALL

    SELECT * FROM Level2

    ORDER BY datakey, telno

    Was this correct?

    Personally I'd go for a temp table and you will probably need to index it for performance.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • bicky1980 (1/13/2012)


    GSquared (1/13/2012)


    The order of the rows in a dataset doesn't really matter to set-based SQL operations on it.

    I just need a clarification on this statement, "Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)"

    If a landline exists multiple times in the data, and has multiple datakey values, don't count it all, or count it only one time?

    E.g.:

    datakey landline pairs:

    1 100

    2 100

    Would that count as 1 or 0? Clarification of "If the same landline is anywhere else in the results its needs to be excluded from the figure". Does that mean exclude it entirely, or exclude the duplicates?

    If that should count as 1, then this seems to work:

    IF OBJECT_ID(N'tempdb..#test2') IS NOT NULL

    DROP TABLE #test2;

    CREATE TABLE #test2

    (indkey NVARCHAR(2),

    datakey NVARCHAR(4),

    landline NVARCHAR(11),

    mobile NVARCHAR(11),

    email NVARCHAR(20))

    INSERT INTO #test2

    VALUES ('1', '0001', '01234567890', '0712345679', '1@test.co.uk'),

    ('2', '0001', '01234567890', '', '1@test.co.uk'),

    ('3', '0002', '01234567890', '', '2@test.co.uk'),

    ('4', '0002', '01234567891', '', '2@test.co.uk'),

    ('5', '0002', '', '07123456789', ''),

    ('6', '0003', '01234567892', '07123456791', ''),

    ('7', '0004', '01234567893', '07123456792', ''),

    ('8', '0005', '01234567894', '07123456793', '2@test.co.uk'),

    ('9', '0008', '01234567895', '07123456793', '9@test.co.uk') ;

    SELECT 'Total' AS CountType, COUNT(*) AS Qty

    FROM #test2

    UNION ALL

    SELECT 'Datakeys', COUNT(DISTINCT datakey)

    FROM #test2

    UNION ALL

    SELECT 'Landlines', COUNT(DISTINCT landline)

    FROM #test2

    WHERE landline > ''

    AND datakey > ''

    UNION ALL

    SELECT 'Mobiles', COUNT(DISTINCT mobile)

    FROM #test2

    WHERE mobile > ''

    AND datakey > ''

    UNION ALL

    SELECT 'Emails', COUNT(DISTINCT email)

    FROM #test2

    WHERE email > ''

    AND datakey > '';

    If 0, then change the Select to this:

    SELECT 'Total' AS CountType, COUNT(*) AS Qty

    FROM #test2

    UNION ALL

    SELECT 'Datakeys', COUNT(DISTINCT datakey)

    FROM #test2

    UNION ALL

    SELECT 'Landlines',

    COUNT(DISTINCT landline)

    FROM (SELECT landline

    FROM #test2

    WHERE landline > ''

    AND datakey > ''

    GROUP BY landline

    HAVING COUNT(*) = 1) AS Sub

    UNION ALL

    SELECT 'Mobiles',

    COUNT(DISTINCT mobile)

    FROM (SELECT mobile

    FROM #test2

    WHERE mobile > ''

    AND datakey > ''

    GROUP BY mobile

    HAVING COUNT(*) = 1) AS Sub

    UNION ALL

    SELECT 'Emails',

    COUNT(DISTINCT email)

    FROM (SELECT email

    FROM #test2

    WHERE email > ''

    AND datakey > ''

    GROUP BY email

    HAVING COUNT(*) = 1) AS Sub;

    (All scripts tested in SQL 2008 R2 Dev Edition.)

    I would only require this to be counted as 1 record only...From your solution for this the results would be:

    Total9

    Datakeys6

    Landlines5

    Mobiles4

    Emails1

    The results should be:

    Total9

    Datakeys6

    Landlines6

    Mobiles5

    Emails3

    The results from my first query are exactly what you list as the "should be" answer.

    Re-ran the first version, copy-and-paste of results is:

    CountTypeQty

    Total9

    Datakeys6

    Landlines6

    Mobiles5

    Emails3

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hello Chris

    I cant post the results due to data protection 🙁 , which is going to be difficult to try and get you to see my issue, this was why I was using a small table of 'made up' records.

  • How do I copy code from the forum to paste into Management studio, without having to refomat it myself everytime?

  • How do I copy code from the forum to paste into Management studio, without having to refomat it myself everytime?

    Paste it into a word document, then copy and paste again. I've got a word doc on my desktop specifically for this.

    bicky1980 (1/17/2012)


    Hello Chris

    I cant post the results due to data protection 🙁 , which is going to be difficult to try and get you to see my issue, this was why I was using a small table of 'made up' records.

    Change the phone numbers (probably the area codes alone will be enough), but be careful that you don't give different random numbers to dupes of the same real number!

    When you get the opportunity, try GSquared's code.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • GSquared (1/17/2012)


    bicky1980 (1/13/2012)


    GSquared (1/13/2012)


    The order of the rows in a dataset doesn't really matter to set-based SQL operations on it.

    I just need a clarification on this statement, "Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)"

    If a landline exists multiple times in the data, and has multiple datakey values, don't count it all, or count it only one time?

    E.g.:

    datakey landline pairs:

    1 100

    2 100

    Would that count as 1 or 0? Clarification of "If the same landline is anywhere else in the results its needs to be excluded from the figure". Does that mean exclude it entirely, or exclude the duplicates?

    If that should count as 1, then this seems to work:

    IF OBJECT_ID(N'tempdb..#test2') IS NOT NULL

    DROP TABLE #test2;

    CREATE TABLE #test2

    (indkey NVARCHAR(2),

    datakey NVARCHAR(4),

    landline NVARCHAR(11),

    mobile NVARCHAR(11),

    email NVARCHAR(20))

    INSERT INTO #test2

    VALUES ('1', '0001', '01234567890', '0712345679', '1@test.co.uk'),

    ('2', '0001', '01234567890', '', '1@test.co.uk'),

    ('3', '0002', '01234567890', '', '2@test.co.uk'),

    ('4', '0002', '01234567891', '', '2@test.co.uk'),

    ('5', '0002', '', '07123456789', ''),

    ('6', '0003', '01234567892', '07123456791', ''),

    ('7', '0004', '01234567893', '07123456792', ''),

    ('8', '0005', '01234567894', '07123456793', '2@test.co.uk'),

    ('9', '0008', '01234567895', '07123456793', '9@test.co.uk') ;

    SELECT 'Total' AS CountType, COUNT(*) AS Qty

    FROM #test2

    UNION ALL

    SELECT 'Datakeys', COUNT(DISTINCT datakey)

    FROM #test2

    UNION ALL

    SELECT 'Landlines', COUNT(DISTINCT landline)

    FROM #test2

    WHERE landline > ''

    AND datakey > ''

    UNION ALL

    SELECT 'Mobiles', COUNT(DISTINCT mobile)

    FROM #test2

    WHERE mobile > ''

    AND datakey > ''

    UNION ALL

    SELECT 'Emails', COUNT(DISTINCT email)

    FROM #test2

    WHERE email > ''

    AND datakey > '';

    If 0, then change the Select to this:

    SELECT 'Total' AS CountType, COUNT(*) AS Qty

    FROM #test2

    UNION ALL

    SELECT 'Datakeys', COUNT(DISTINCT datakey)

    FROM #test2

    UNION ALL

    SELECT 'Landlines',

    COUNT(DISTINCT landline)

    FROM (SELECT landline

    FROM #test2

    WHERE landline > ''

    AND datakey > ''

    GROUP BY landline

    HAVING COUNT(*) = 1) AS Sub

    UNION ALL

    SELECT 'Mobiles',

    COUNT(DISTINCT mobile)

    FROM (SELECT mobile

    FROM #test2

    WHERE mobile > ''

    AND datakey > ''

    GROUP BY mobile

    HAVING COUNT(*) = 1) AS Sub

    UNION ALL

    SELECT 'Emails',

    COUNT(DISTINCT email)

    FROM (SELECT email

    FROM #test2

    WHERE email > ''

    AND datakey > ''

    GROUP BY email

    HAVING COUNT(*) = 1) AS Sub;

    (All scripts tested in SQL 2008 R2 Dev Edition.)

    I would only require this to be counted as 1 record only...From your solution for this the results would be:

    Total9

    Datakeys6

    Landlines5

    Mobiles4

    Emails1

    The results should be:

    Total9

    Datakeys6

    Landlines6

    Mobiles5

    Emails3

    The results from my first query are exactly what you list as the "should be" answer.

    Re-ran the first version, copy-and-paste of results is:

    CountTypeQty

    Total9

    Datakeys6

    Landlines6

    Mobiles5

    Emails3

    Hello GSquared

    I have run your code on my sample of approx 100,000 records but its difficult to understand why this would work, I dont see where you code ensures that if I had 2 different landlines at the same datakey, these wouldn't be counted seperately (I ran the 1st code)

  • bicky1980 (1/17/2012)I have run your code on my sample of approx 100,000 records but its difficult to understand why this would work, I dont see where you code ensures that if I had 2 different landlines at the same datakey, these wouldn't be counted seperately (I ran the 1st code)

    To confirm... You are saying that if the ONLY time 2 landlines appear is with the same datakey, that should only count as 1. Correct? To shorten the data for ease of understanding:

    DataKey, LandLine

    1, 100

    1, 200

    2, 100

    2, 200

    2, 300

    How many landlines is this? 2? And if the line with the 300 was not there... it would be 1? I have to be honest that, having a background in statistics, I am really puzzled at the question that this is trying to answer. Any chance you could appease me by phrasing the business question as something like "How many landlines are in use by different datakeys?" OR "Collapsing across landlines and then datakeys, what is the count?" Something just doesn't make sense here as an interpretation of results. I am not saying that you are wrong or anything, just trying to understand because right now I can't answer a question that I don't understand.

    I have done several similar analyses and have scripts in producation that are giving a similar output. If I can understand the question and thereby the business rules a bit clearer, I can simply adapt my current script and post it for you.

    Jared
    CE - Microsoft

  • GSquared (1/17/2012)


    The results from my first query are exactly what you list as the "should be" answer.

    Re-ran the first version, copy-and-paste of results is:

    CountTypeQty

    Total9

    Datakeys6

    Landlines6

    Mobiles5

    Emails3

    Yes, but there's a problem Gus; that test case doesn't cover all the requirements. The requirement is to count not landlines, but a set of datakey-landline pairs such that each landline and each datakey occurs only once. The other counts have a similar requirement. So with this test dataset

    TRUNCATE TABLE #test2

    INSERT INTO #test2 VALUES

    ('1', '0001', '01234567890', '', '1@test.co.uk')

    , ('2', '0001', '01234567891', '', '1@test.co.uk')

    , ('3', '0001', '01234567892', '', '2@test.co.uk') ;

    the required result is 3,1,1,0,1 but your code returns 3,1,3,0,2.

    Actually there is still a lot of detail missing from the requirement, which is a pity because I suspect that if we had the exact requirements we could either come up with a solution or explain that it would be sensible to derive the answers using some tool other than T-SQL.

    Tom

  • L' Eomot Inversé (1/17/2012)a set of datakey-landline pairs such that each landline and each datakey occurs only once

    As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...

    Jared
    CE - Microsoft

  • SQLKnowItAll (1/17/2012)


    L' Eomot Inversé (1/17/2012)a set of datakey-landline pairs such that each landline and each datakey occurs only once

    As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...

    The required order is dedupe datakey first, followed by landline.

Viewing 15 posts - 76 through 90 (of 120 total)

You must be logged in to reply to this topic. Login to reply