Query result with a value condition

  • Hi All,

    I need your help to figure out a query.

    create table #Ord (OrdID INT, OrdDt DATE, CustID INT, MtchID INT)

    INSERT INTO #Ord

    select 101, GETDATE(), 123, 111

    union all

    select 102, GETDATE(), 124, 112

    union all

    select 103, (GETDATE()-1), 125, 112

    union all

    select 103, (GETDATE()-1), 125, 111

    union all

    select 102, (GETDATE()-1), 126, 110

    select * from #Ord

    create table #Prs (MtchID INT, Price INT, OrdLoc VARCHAR(6))

    INSERT INTO #Prs

    select 111, 10, 'Online'

    union all

    select 110, 10, null

    union all

    select 112, 20, 'Phone'

    union all

    select 163, 10, 'Phone'

    union all

    select 182, 100, 'Online'

    select * from #Prs

    select * from #Ord o join #Prs p ON o.MtchID = p.MtchID

    and OrdLoc = 'Phone'

    drop table #Ord

    drop table #Prs

    Result:

    OrdID OrdDt CustID MtchID MtchIDPriceOrdLoc

    103 2016-02-0212511111110Online

    1022016-02-0312411211220Phone

    As shown in the result set there are multiple OrdLoc's, it can be(Phone, Online, Instore etc) but I have to bring in only Phone orders and if there is a Online order I have get the Online order instead of Phone.

    Above result set for OrdID 103 there is a Online and Phone Order, but I have to display only Online order details for OrdID 103 and phone OrdLoc for the other OrdID's

    appreciate your help.

    Thanks in advance!

  • I think the code below does what you want, if I understand the requirements correctly. It scans each table twice, so if the source tables are very large then this might not be so speedy. Having said that, I'm not sure there's actually a more efficient way to do this, given the requirements. There's always the possibility I'm missing some particularly clever approach (especially this late at night), and hopefully someone else points one out if that's the case. 🙂

    WITH online_or_phone AS

    (

    SELECT OrdId,

    is_online=MAX(CASE WHEN OrdLoc='Online' THEN 1 ELSE 0 END)

    FROM #Ord o INNER JOIN #Prs p ON o.mtchid=p.mtchid

    GROUP BY OrdID

    HAVING MAX(CASE WHEN OrdLoc='Phone' THEN 1 END)=1

    )

    SELECT O.*,P.*

    FROM #Ord O

    INNER JOIN #Prs p ON o.mtchid=p.mtchid

    INNER JOIN online_or_phone OOP ON o.OrdId=OOP.OrdId AND

    p.OrdLoc=CASE WHEN OOP.is_online=1

    THEN 'Online'

    ELSE 'Phone'

    END;

    Cheers!

  • Here is an option that I think does a little less work but I am still not crazy about needing to use row_number. I think there is still a better way:

    with ord as (select o.OrdID,

    o.OrdDt,

    o.CustId,

    o.MtchID,

    p.Price,

    p.OrdLoc,

    -- relies on "Online" sorting before "Phone" so delivers Online order first, else delivers Phone

    row_number() over (partition by o.OrdID order by p.OrdLoc) as row_num

    from #Ord o

    join #Prs p ON o.MtchID = p.MtchID

    where p.OrdLoc in ('Phone', 'Online'))

    select OrdID,

    OrdDt,

    CustId,

    MtchID,

    Price,

    OrdLoc

    from ord

    where row_num = 1;

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Jacob Wilkins (2/3/2016)


    I think the code below does what you want, if I understand the requirements correctly. It scans each table twice, so if the source tables are very large then this might not be so speedy. Having said that, I'm not sure there's actually a more efficient way to do this, given the requirements. There's always the possibility I'm missing some particularly clever approach (especially this late at night), and hopefully someone else points one out if that's the case. 🙂

    WITH online_or_phone AS

    (

    SELECT OrdId,

    is_online=MAX(CASE WHEN OrdLoc='Online' THEN 1 ELSE 0 END)

    FROM #Ord o INNER JOIN #Prs p ON o.mtchid=p.mtchid

    GROUP BY OrdID

    HAVING MAX(CASE WHEN OrdLoc='Phone' THEN 1 END)=1

    )

    SELECT O.*,P.*

    FROM #Ord O

    INNER JOIN #Prs p ON o.mtchid=p.mtchid

    INNER JOIN online_or_phone OOP ON o.OrdId=OOP.OrdId AND

    p.OrdLoc=CASE WHEN OOP.is_online=1

    THEN 'Online'

    ELSE 'Phone'

    END;

    Cheers!

    Be careful with this one. This gives 2 rows from the test dataset and I think should give 3 rows. Also, I think the code assumes some type of uniqueness on the join-predicates because as soon as I expanded the test dataset to test for performance a bit I started seeing lots of duplicates in the results.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • This one seems to do quite a bit better than the others on elapsed time:

    with cte as (select distinct OrdID

    from #Ord)

    select cte.OrdID,

    x.OrdDt,

    x.CustId,

    x.MtchID,

    x.Price,

    x.OrdLoc

    from cte

    cross apply (select top 1

    o.OrdDt,

    o.CustId,

    o.MtchID,

    p.Price,

    p.OrdLoc

    from #Ord o

    join #Prs p ON o.MtchID = p.MtchID

    where p.OrdLoc in ('Phone', 'Online')

    and cte.OrdID = o.OrdID

    order by p.OrdLoc) x;

    I assume you gave us a basic version of the problem using temp tables minus the representative constraints and indexes. It would be good to check the execution plans on your side once you wire it up to make sure the query you decide is well supported.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Orlando Colamatteo (2/4/2016)


    Jacob Wilkins (2/3/2016)


    I think the code below does what you want, if I understand the requirements correctly. It scans each table twice, so if the source tables are very large then this might not be so speedy. Having said that, I'm not sure there's actually a more efficient way to do this, given the requirements. There's always the possibility I'm missing some particularly clever approach (especially this late at night), and hopefully someone else points one out if that's the case. 🙂

    WITH online_or_phone AS

    (

    SELECT OrdId,

    is_online=MAX(CASE WHEN OrdLoc='Online' THEN 1 ELSE 0 END)

    FROM #Ord o INNER JOIN #Prs p ON o.mtchid=p.mtchid

    GROUP BY OrdID

    HAVING MAX(CASE WHEN OrdLoc='Phone' THEN 1 END)=1

    )

    SELECT O.*,P.*

    FROM #Ord O

    INNER JOIN #Prs p ON o.mtchid=p.mtchid

    INNER JOIN online_or_phone OOP ON o.OrdId=OOP.OrdId AND

    p.OrdLoc=CASE WHEN OOP.is_online=1

    THEN 'Online'

    ELSE 'Phone'

    END;

    Cheers!

    Be careful with this one. This gives 2 rows from the test dataset and I think should give 3 rows. Also, I think the code assumes some type of uniqueness on the join-predicates because as soon as I expanded the test dataset to test for performance a bit I started seeing lots of duplicates in the results.

    I'm aware that it only gives two rows (and the OPs desired result was just two rows).

    I also initially thought there should be 3, but when I reread the OPs post, I realized (I think) what he was saying.

    As I read it, he wants to return information only for OrdIDs that match to an OrdLoc of 'Phone'. Of the OrdIDs that match to an OrdLoc of 'Phone', if they also match to an OrdLoc of 'Online', return the row for that OrdId that matches to 'Online' instead of 'Phone'.

    That requirement would result in the two rows the OP requested, so I'm guessing that's what he means, and that's what my query implements (I initially also did the row_number approach, assuming 3 rows should be returned, until I re-read the requirements). Of course, we'll just have to wait for the OP to clarify 🙂

    If you started extending the result set and saw duplicates, then I'm assuming you just did something like INSERT INTO #Ord SELECT * FROM #Ord and the same for #Prs.

    That would result in duplicates because there are now several rows in #Prs for the same MtchID, so each OrdId that matched to 'Phone' originally now matches several 'Online' or 'Phone' rows in #Prs.

    I (and the query) did assume the actual data doesn't allow that, as you pointed out; if that is allowed in the data, then yes, my query would have to be changed.

    At any rate, the main thing is to see if my reading of the requirements is correct, because that's a different question than your queries (and a query I almost submitted) answered.

    Back in the OP's court 🙂

    EDIT: Mistakenly referred to #Prs as #Tmp at one point. Fixed that.

  • I see what you are pointing out now, that before we decide to return an online order we must first know there was a phone order for that OrdID. Sounds like a call center upsell report. Agreed that raises the bar on complexity.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Try the following.

    WITH orders AS (

    SELECT o.OrdID, o.OrdDt, o.CustID, o.MtchID, p.Price, p.OrdLoc, ROW_NUMBER() OVER(PARTITION BY o.OrdID ORDER BY p.OrdLoc) AS rn, MAX(p.OrdLoc) OVER(PARTITION BY o.OrdID) AS MaxLoc

    FROM #Ord o

    INNER JOIN #Prs p

    ON o.MtchID = p.MtchID

    WHERE p.OrdLoc IN ('Online', 'Phone')

    )

    SELECT OrdId, OrdDt, CustID, MtchID, Price, OrdLoc

    FROM orders

    WHERE rn = 1

    AND MaxLoc = 'Phone';

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thank you all for the replies!

    @jacob Wilkins - You are correct, if there are two or more MtchId's with "Phone", "Online" and other OrdLoc's I want to display only the "Online" match, if there is no "Online" then I should have "Phone" , so "online" takes the precedence over "Phone".

    Thank you!

  • Retracted my code suggestion, sorry, misread the requirement.


    Kindest Regards,

    Ian Smith

  • ssc_san (2/4/2016)


    Thank you all for the replies!

    @jacob Wilkins - You are correct, if there are two or more MtchId's with "Phone", "Online" and other OrdLoc's I want to display only the "Online" match, if there is no "Online" then I should have "Phone" , so "online" takes the precedence over "Phone".

    Thank you!

    I think we have it, but for completeness, can you please confirm that if there is only an Online order, i.e. that the OrdID does not also have a match to a Phone order, that we should not return that Online order?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi Orlando, that's exactly the bit which led me to retract my suggestion. 🙂


    Kindest Regards,

    Ian Smith

  • OK, I'll go with it...

    This query has a lower plan cost than the other two solutions that seem to satisfy the requirements (Jacob's and Drew's) but I haven't tested all three at scale to see how each of them hold up...will do that later tonight:

    WITH cte

    AS (

    SELECT o.OrdID,

    o.OrdDt,

    o.CustID,

    o.MtchID,

    p.Price,

    p.OrdLoc,

    LEAD(o.OrdID, 1, 0) OVER (PARTITION BY o.OrdID ORDER BY o.OrdID) AS NextOrdID,

    LAG(o.OrdID, 1, 0) OVER (PARTITION BY o.OrdID ORDER BY p.OrdLoc DESC) AS LastOrdID,

    LAG(p.OrdLoc, 1, '') OVER (PARTITION BY o.OrdID ORDER BY p.OrdLoc DESC) AS LastOrdLoc

    FROM #Ord o

    INNER JOIN #Prs p ON o.MtchID = p.MtchID

    WHERE p.OrdLoc IN ('Phone', 'Online')

    )

    SELECT cte.OrdID,

    cte.OrdDt,

    cte.CustID,

    cte.MtchID,

    cte.Price,

    cte.OrdLoc

    FROM cte

    WHERE -- phone-only order ids

    (

    cte.OrdLoc = 'Phone'

    AND cte.NextOrdID = 0

    )

    OR

    -- OrdID has Online and Phone order

    (

    cte.LastOrdID = cte.OrdID

    AND cte.LastOrdLoc = 'Phone'

    );

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Very nice Orlando! I thought there might be something with LAG/LEAD that allowed a single scan of each table, and it looks like you found one.

    I ran through some quick tests with those three solutions, and the below were the results:

    1000 rows:

    solution last_logical_reads last_worker_time last_elapsed_time

    --------------- -------------------- -------------------- --------------------

    test_Jacob 28 6000 6000

    test_Drew 4617 8997 8997

    test_Orlando 14 9998 9998

    100000 rows:

    solution last_logical_reads last_worker_time last_elapsed_time

    -------------- -------------------- -------------------- --------------------

    test_Jacob 1252 454098 454098

    test_Orlando 626 900020 900020

    test_Drew 460629 924051 924051

    1000000 rows:

    solution last_logical_reads last_worker_time last_elapsed_time

    --------------- -------------------- -------------------- --------------------

    test_Jacob 12192 5834277 6089390

    test_Orlando 12580 10248498 10278545

    test_Drew 4609126 10355920 11405656

    I expected the reduced IO from Orlando's would allow it to blow the doors off the others, but it seems nothing comes for free. In this case, the reduced IO comes at the cost of a couple very expensive sorts.

    A few caveats: 1) my machine has very underwhelming specs, so all the sorts and hash joins spilled to tempdb for the larger row count runs; 2) I didn't change the shape of the data, just multiplied rows in both tables; 3) I didn't look at adding any supporting indexing. With different shapes of the data or some supporting indexing this could be very different, but I was only willing to spend so much time on this tonight :-). I might revisit and tinker with all that later, but for now I'm going to retreat into the non-SQL Server world (a scary place).

    I've attached the script I used for testing as a .txt file, since the site didn't seem to like my putting it in the post.

    Cheers!

    EDIT: Updated my caveats, since I left one out 🙂

  • I think we have it, but for completeness, can you please confirm that if there is only an Online order, i.e. that the OrdID does not also have a match to a Phone order, that we should not return that Online order?

    There is a possibility of having "Phone", "Online" and "" (Space), no present both(Phone & Online) will be separate.

    Thank you for your time and help @Orlando Colamatteo, @jacob Wilkins & @IanMGSmith!

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply