Join Issues

  • I have 2 tables and I want to pull data from.  The first table stores organization information and the second stores financial data for the organizations.  I have 45 organizations, but only 18 of them have any financial data in the second table.  What I would really like to occur is for me to get the complete list of organizations and any details for any organization that have them from one query.

    When I try this statement I get what I want, but . . .

    select o.*, t.*

    FROM Organizations o left join totals t on o.organizationid = t.organizationid

    WHERE     (o.Lineage LIKE '8%') ORDER BY o.Lineage

    I haven't limited the query yet as much as I would like.  I really need to limit the query by a date range where all the date related fields are in the details table (Totals).  Once I add on this restriction, the other 27 organizations do not appear in my dataset.  Anyone know how I can do this?

    This is my query with the additional limitation:

    SELECT     o.OrganizationID, o.Label, o.Name, o.Depth, o.Lineage, t.MonthlySumRev, t.MonthlySumPTP, t.QuarterAvgRev, t.QuarterAvgPTP, t.QuarterTotalRev, t.QuarterTotalPTP, t.[Month], t.[Year], t.Quarter

    FROM  Organizations o LEFT OUTER JOIN Totals t ON o.OrganizationID = t.OrganizationID WHERE     (o.Lineage LIKE '8%') AND (t.[Month] >= 7) AND (t.[Year] = 2004) OR (o.Lineage LIKE '8%') AND (t.[Month] <= 6) AND (t.[Year] = 2005) ORDER BY o.Lineage, t.[Year], t.[Month]

    thanks in advance.

  • Hi Ed

    If, for a particular record in 'organization', a related record does not exist in the 'totals' table, all of the fields retrieved from the totals table will be NULL.

    So you need to modify your WHERE clause to accommodate this, eg:

    WHERE

    (o.Lineage LIKE '8%') AND (t.[organizationid] is null) OR

    (o.Lineage LIKE '8%') AND (t.[Month] >= 7) AND (t.[Year] = 2004) OR

    (o.Lineage LIKE '8%') AND (t.[Month] <= 6) AND (t.[Year] = 2005) ORDER BY o.Lineage, t.[Year], t.[Month]

    Regards

    Phil

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • thank you.  that did the trick.  I love it when the solutions turn out to be fairly straight forward.

  • You have discovered that filtering records in the WHERE clause can turn an outer join back into an inner join.  The answer is not to make the WHERE clause more complicated.

    Method 1: Move the filter condition to the ON clause

    SELECT  ...

    FROM  Organizations o

    LEFT JOIN Totals t ON o.OrganizationID = t.OrganizationID  AND (t.[Month] >= 7) AND (t.[Year] = 2004) OR (t.[Month] <= 6) AND (t.[Year] = 2005)

    WHERE     (o.Lineage LIKE '8%')

    ORDER BY o.Lineage, t.[Year], t.[Month]

    Method 2: Use a subquery for one side of the join

    SELECT  ...

    FROM  Organizations o

    LEFT JOIN (

       SELECT * FROM Totals

       WHERE ([Month] >= 7) AND ([Year] = 2004) OR ([Month] <= 6) AND ([Year] = 2005)

    ) t ON o.OrganizationID = t.OrganizationID  

    WHERE     (o.Lineage LIKE '8%')

    ORDER BY o.Lineage, t.[Year], t.[Month]

    I usually use method 1, but method 2 is a little easier to read and maintain.

  • I should add that anytime you have more than one way to write a query, you should put the variations together in a batch and use Query Analyzer to show the execution plans and performance estimates.  You often learn something useful either about query structure or index usage (or places you need to add an index).  I would be curious whether either of these methods performs any differently than the previous response.

  • Oops, forgot a set of parenthesis

    SELECT  ...

    FROM  Organizations o

    LEFT JOIN Totals t ON o.OrganizationID = t.OrganizationID  AND ((t.[Month] >= 7) AND (t.[Year] = 2004) OR (t.[Month] <= 6) AND (t.[Year] = 2005))

    WHERE     (o.Lineage LIKE '8%')

    ORDER BY o.Lineage, t.[Year], t.[Month]

  • thanks a lot scott.  the method1 approach as written originally was returning 4760 rows and the method2 approach was returning 300 (more of what I was expecting).  I added another o.OrganizationID=t.OrganizationID after the "OR" and then I received the 300 records as well. 

    I ran both in Query Analyzer and the execution plans turned out to be exactly the same (as far as I can tell).  I ran Phill's query as well and its execution plan was a little different.  To be quite honest, I don't really know what I can learn from the plans (although I see a lot of posts where people talk about them).

    Either way, the information you have given is much appreciated.  Thanks.

  • If you put variations in the same batch and get the execution plan, it should show you the relative performance as reflected in the % of total time for each query.  You don't have to understand every symbol in the execution plan, but if it says one version is many times faster you know which one to use.

    Sometimes it shows you that one variation uses indexes where another one uses a table scan.  Or it may show an incredibly fat arrow where you've accidentally created a cross join (one of my co-workers recently had a query with an estimated row count of 5e15).  Sometimes it shows that there are missing statistics, which it will be happy to create with a few mouse clicks.

    You can look for the most expensive part of a query using the percent values, and try to find a way to enable SQL to use faster operations.  A table scan or index scan is more expensive than an index seek.  A hash join is more expensive than a lookup or merge.

    If you think an index might help (and you have plenty of free space), you can test it by creating a duplicate table with the proposed index.  Create a batch with two versions of the same query, one using the original table and one using the new one.  The execution plan will show you whether your index is used, and how much of a difference it makes.  (The duplicate table must contain the same data as the original, not just the structure).

    When you are comfortable with execution plans, you will know a lot more about writing efficient SQL than you do now.

  • Hi all, slightly different query, i have two tables, in one table i have list of Traanporters and in other table i have transporters associated with states. for e.g. the data in transporter table is

    Transporter_id

    Transporter_Name

    1

    a

    2

    b

    3

    c

    4

    d

    5

    e

    And the data in association table is

    Transporter_id

    State_id

    1

    100

    1

    101

    1

    102

    1

    103

    2

    101

    2

    103

    3

    101

    Now i want to fire a query which returns ALL transporter name even if the data is not present in association table. for eg. if i want to find the transporter available in state 101 the query should return,

    Transporter_Name

    State_id

    a

    101

    b

    101

    c

     

    d

     

    e

     

    Can anyone pls. help me out.....

    Thanks in advance

  • Something like this?

    select t.transporter_name, a.state_id

    from transporters t left outer join association a on

    t.transporter_id = a.transporter_id

    where state_id = 101 or state_id is null

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • nope, it gives

    Tranporter_Name State_id

    c                          Null

    d                          Null  

    e                          Null

  • oops, sorry... thanx.... yes the query is giving desired output....

  • Should work. Is state_id a text field? Then you'd need

    state_id = '101'

    Or does it need to be trimmed?

    trim(state_id) = '101'

    Aha ... just seen your amended post - good.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I'm still prejudiced against the construction "WHERE field = value OR field IS NULL".  My feeling is you want to filter the rows before the join, not after (although the query optimizer might figure this out anyway).

    I would recommend:

    select t.transporter_name, a.state_id

    from transporters t

    left outer join association a on t.transporter_id = a.transporter_id

       and state_id = 101

  • my joining issues continue, but now with a different problem.

    I have 4 tables, 1 table is kind of the central table and the other 3 are foreign key related details tables.  I want to use a single sql statement to select data from all 4 tables.  I was attempting to extend what I asked at the beginning of this thread to the other tables as well, but I am getting large numbers of rows returned (4680) when I am expecting more like ~80 rows.

    Central table (Services): serviceid (int), organizationid(int), name varchar(50), description varchar(200)

    Other 3 tables (Actuals, Flash, Forecast):uniqueid(int), serviceid(int), amount (money), comment varchar(200),year(int), month(int)

    I am trying:

    select s.* from Services s left join Actuals a on s.serviceid=a.serviceid left join Flash f on s.serviceid=f.serviceid left join Forecast fo on s.serviceid=fo.serviceid

    And this isn't ultimately what I truly want, I am just trying to get to all of the rows of data back first.  Ultimately, I am going to want to specify additional limiting parameters like year and month in each of the child tables.  I am fully expecting each of the tables to have more data than the next for the given time range as well.  So I am thinking I will want to coalesce the data amongst the tables as well. 

    i.e. the Actuals table will have data for the first x months only, then the Flash table has data for the next y months, and then the next z months will come from the Forecast table.  At the time of the query, I do not know what x,y, or z will be.  I know the total time range I am searching for, but I do not know which tables will have the data.  I do know that I always want to get what I can out of those tables in that order (Actuals, Flash, Forecast) all the time and if the first table has the data, I do not want to get any data from the other tables for that same month and year. 

    Any help again will be greatly appreciated.  Thanks.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply