partition data

  • I need to create a query that can partition a data set and the do some updates to the partitions and finally produce an output.

    In my data set are rows that pertain to individual trade transactions. A package is made up of several transactions. Each transaction contains a broker, portfolio, commission and gross amount. Within each transaction there can be placeholder brokers and place older portfolios. i.e.values that are used as dummies to fill fields in the source entry system. For each package there can only be on real broker

    My query needs to pull out all the rows(transactions) per package which have a real portfolio and then for those lines update the broker field with correct/real broker for that package. The way to determine if a broker or portfolio is real is to check if it exists in the portfolio and broker table. I have provided some sample data for clarity.

    CREATE TABLE #TradeDetail

    (

    Package varchar(12)

    , [Broker] varchar(50)

    , Portfolio VARCHAR(20)

    , GrossAmount_EUR numeric(18,6)

    , Commission_EUR numeric(18,6)

    )

    CREATE TABLE #Broker

    (

    ID INT IDENTITY (1,1)

    , [Broker] varchar(50)

    )

    CREATE TABLE #Portfolio

    (

    ID INT IDENTITY (1,1)

    , [Portfolio] varchar(50)

    )

    INSERT INTO #TradeDetail

    --drop table #TradeDetail

    SELECT '123456','Good_Broker','Investment','120','130' UNION ALL

    SELECT '123456','Bad_Broker1','Investment_false','100','130' UNION ALL

    SELECT '123456','Bad_Broker2','InvestCo','120','100' UNION ALL

    SELECT '123456','Bad_Broker3','Investment_false','100','130' UNION ALL

    SELECT '123456','Bad_Broker4','Investment_Cash','100','150' UNION ALL

    SELECT '123456','Bad_Broker5','PortfolioCO','150','100' UNION ALL

    SELECT '789110','Good_Broker2','InvestmentCO2','150','100' UNION ALL

    SELECT '789110','Bad_Broker6','Investment_false','150','100' UNION ALL

    SELECT '789110','Bad_Broker7','PortfolioCO2','150','100' UNION ALL

    SELECT '789110','Bad_Broker8','Investment_Cash','150','100' UNION ALL

    SELECT '789110','Bad_Broker9','PortfolioCO','150','100' UNION ALL

    SELECT '453211','Good_Broker3','InvestCo','150','100' UNION ALL

    SELECT '453211','Bad_Broker10','InvestmentCO1','150','100' UNION ALL

    SELECT '453211','Bad_Broker11','Investment_false2','150','143' UNION ALL

    SELECT '453211','Bad_Broker12','InvestmentCO2','105','200' UNION ALL

    SELECT '453211','Bad_Broker13','Investment_false','100','200' UNION ALL

    SELECT '453211','Bad_Broker14','Investment','100','220'

    INSERT INTO #Broker

    SELECT 'Good_Broker' UNION ALL

    SELECT 'Good_Broker2' UNION ALL

    SELECT 'Good_Broker3'

    INSERT INTO #Portfolio

    SELECT 'Investment' UNION ALL

    SELECT 'InvestCo' UNION ALL

    SELECT 'PortfolioCO' UNION ALL

    SELECT 'PortfolioCO2' UNION ALL

    SELECT 'PortfolioCO' UNION ALL

    SELECT 'PortfolioCO' UNION ALL

    SELECT 'InvestmentCO1' UNION ALL

    SELECT 'InvestmentCO2'

    With this data the output should be:

    Package Broker Portfolio GrossAmount_EUR Commission_EUR

    123456 Good_Broker Investment 120 130

    123456 Good_Broker InvestCo 120 100

    123456 Good_Broker PortfolioCO 150 100

    789110 Good_Broker2 InvestmentCO2 150 100

    789110 Good_Broker2 PortfolioCO2 150 100

    789110 Good_Broker2 PortfolioCO 150 100

    453211 Good_Broker3 InvestCo 150 100

    453211 Good_Broker3 InvestmentCO1 150 100

    453211 Good_Broker3 InvestmentCO2 105 200

    453211 Good_Broker3 Investment 100 220

  • I felt you needed a reply, as you've provided us with DDL and sample data so I've had a go.

    There's a possibility I haven't understood your problem correctly, so test this thoroughly 😉

    SELECT Package, b.Broker, Portfolio, GrossAmount_EUR, Commission_EUR

    FROM (SELECT a.Package, b.Portfolio, a.GrossAmount_EUR, a.Commission_EUR,

    DENSE_RANK() OVER(ORDER BY a.Package) AS joiner

    FROM #TradeDetail a

    INNER JOIN #Portfolio b ON a.Portfolio = b.Portfolio

    GROUP BY a.Package, b.Portfolio, a.GrossAmount_EUR, a.Commission_EUR) a

    INNER JOIN #Broker b ON a.joiner = b.ID


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre (12/14/2011)


    I felt you needed a reply, as you've provided us with DDL and sample data so I've had a go.

    There's a possibility I haven't understood your problem correctly, so test this thoroughly 😉

    SELECT Package, b.Broker, Portfolio, GrossAmount_EUR, Commission_EUR

    FROM (SELECT a.Package, b.Portfolio, a.GrossAmount_EUR, a.Commission_EUR,

    DENSE_RANK() OVER(ORDER BY a.Package) AS joiner

    FROM #TradeDetail a

    INNER JOIN #Portfolio b ON a.Portfolio = b.Portfolio

    GROUP BY a.Package, b.Portfolio, a.GrossAmount_EUR, a.Commission_EUR) a

    INNER JOIN #Broker b ON a.joiner = b.ID

    Your join depends on an artifact of the sample data that you simply cannot guarantee in your production data. To see the problem, use the following for your broker table:

    INSERT INTO #Broker

    SELECT 'Good_Broker' UNION ALL

    SELECT 'Good_Broker1' UNION ALL

    SELECT 'Good_Broker2' UNION ALL

    SELECT 'Good_Broker3'

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Cheers.

    I actually managed to get the result with this query 😛

    SELECT

    td.Package

    ,MAX(B.[Broker]) OVER (PARTITION BY td.Package) AS [Broker]

    ,td.Portfolio

    ,td.GrossAmount_EUR

    ,td.Commission_EUR

    FROM #TradeDetail td

    LEFT JOIN #Broker b

    ON B.[Broker] = td.[Broker]

    WHERE EXISTS(SELECT 1 FROM #Portfolio WHERE Portfolio=td.Portfolio)

    thanks for the response

  • I was about to post almost exactly the same query. The only difference was that I used an INNER JOIN rather than an EXISTS for the portfolio. Depending on your data and indices, either one might perform better.

    The join did bring out a problem with your sample data. You had duplicate portfolio records, because of the UNION ALL. I assume that this is just a problem with the way that the sample was generated. If you have duplicates in your actual data, then you should take some measures to clean up the data and prevent this from happening.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply