"AND c.customerID IS NULL" makes query run "forever"

  • If I run this query

    SELECT

    sc.SourceContactID

    FROM SourceLists_SourceContacts slsc

    INNER JOIN [sourceContacts] sc ON sc.SourceContactID = slsc.SourceContactID

    LEFT JOIN customers c ON sc.email = c.email

    LEFT JOIN vw_UnsubscribedEmails AS ue ON sc.email = ue.email

    WHERE slsc.SourceListID = 59

    AND ue.email IS NULL -- filter out unsubscribed

    it executes in about 5 seconds. However, if I simply add AND c.customerID IS NULL -- filter out customers to the WHERE clause,i.e. this query

    SELECT

    sc.SourceContactID

    FROM SourceLists_SourceContacts slsc

    INNER JOIN [sourceContacts] sc ON sc.SourceContactID = slsc.SourceContactID

    LEFT JOIN customers c ON sc.email = c.email

    LEFT JOIN vw_UnsubscribedEmails AS ue ON sc.email = ue.email

    WHERE slsc.SourceListID = 59

    AND ue.email IS NULL -- filter out unsubscribed

    AND c.customerID IS NULL -- filter out customers

    it will run "forever". I will literally run is ManagementStudio for over 10 minutes and I finally have to cancel the query.

    customers table is a simple with customerID, email, firstName, etc.

    sourceContacts is a simple table with sourceContactID, email, firstName, etc

    sourceLists_sourceContacts is just a joining table: sourceListID to sourceContactID

    vw_UnsubscribedEmails is

    CREATE VIEW [dbo].[vw_UnsubscribedEmails]

    AS

    SELECT email, dateCreated, source

    FROM dbo.unsubscribedEmails

    UNION

    SELECT email, NULL AS dateCreated, NULL AS source

    FROM dbo.customers

    WHERE (isOnEmailList = 0 AND isEmailRestricted = 1) OR

    (badEmail = 1) OR

    (archiveBit = 1)

    I simply want to filter out any source contacts where they have a matching email in the customers table. What could be causing the query to run long?

  • If use the initial query as a subquery and then filter by customerID IS NULL, i.e.

    SELECT qry.sourceContactID

    FROM

    (

    SELECT

    sc.SourceContactID

    FROM SourceLists_SourceContacts slsc

    INNER JOIN [sourceContacts] sc ON sc.SourceContactID = slsc.SourceContactID

    LEFT JOIN customers c ON sc.email = c.email

    LEFT JOIN vw_UnsubscribedEmails AS ue ON sc.email = ue.email

    WHERE slsc.SourceListID = 59

    AND ue.email IS NULL -- filter out unsubscribed

    ) qry

    INNER JOIN sourceContacts sc ON qry.sourceContactID = sc.sourceContactID

    LEFT JOIN customers c2 ON sc.email = c2.

    WHERE c2.[customerID] IS NULL

    It runs in about 5 seconds. Not sure why doing it that way is better. Any thoughts?

    -Phil

  • Hi

    By looking at your table structure CustomerID is Primary Key. Not quite sure why do you want to have CustomoerID IS NULL Condition in the query.

    Can you also tell us the cardinality between the tables.

    Thanks -- Vj

    http://dotnetvj.blogspot.com

  • I want to filter out any sourceContacts that have a matching email in the customer table. Due to the LEFT JOIN, if there is no match by email then customerID (or any of the other customer fields )should be null.

    SourceLists and SourceContacts have a many-to-many relationship. Customers and SourceContacts each have email as unique contstraints but have no direct relationship via primary/foreign keys just joining on email. The vw_UnsubscribedEmails only returns unique emails.

    -Phil

  • Just curious (since we don't have the DDL for the tables, sample data for the tables, or expected results based on the sample data), what happens to your query if you modify the jons to use inner joins instead of outer joins? From what I see reading your code, the WHERE clause is basically doing that any way.

  • Lynn Pettis (2/4/2009)


    Just curious (since we don't have the DDL for the tables, sample data for the tables, or expected results based on the sample data), what happens to your query if you modify the jons to use inner joins instead of outer joins? From what I see reading your code, the WHERE clause is basically doing that any way.

    I can't inner join to customers. sourceContacts is basically a list of email recipients. There might be 100,000 recipients associated with a given sourceListID. Of those 100K, there will be a small percentage of recipients that have the same email in the customers table. I'm doing the left join so I make sure to return all the source contacts and I check there is no match to customers with the customerID null check.

    But to answer your question, if I do an inner join it returns quickly (but doesn't give me the result I need).

    -Phil

  • Then we could use the following: table DDL, sample data (as INSERT statements that can be cut, paste, and executed to load the tables), and the expected results based on the sample data. Please be sure the sample covers all possible data possibilities.

    Any questions on this, please read the first article linked below in my signature block regarding asking for assistance.

  • philcruz (2/4/2009)


    vw_UnsubscribedEmails is

    CREATE VIEW [dbo].[vw_UnsubscribedEmails]

    AS

    SELECT email, dateCreated, source

    FROM dbo.unsubscribedEmails

    UNION

    SELECT email, NULL AS dateCreated, NULL AS source

    FROM dbo.customers

    WHERE (isOnEmailList = 0 AND isEmailRestricted = 1) OR

    (badEmail = 1) OR

    (archiveBit = 1)

    I'm not sure why you are using vw_UnsubscribedEmails which includes some email values from the customers table and then joining the whole customers to remove the rest. Have you tried this?

    SELECT

    sc.SourceContactID

    FROM SourceLists_SourceContacts slsc

    INNER JOIN [sourceContacts] sc ON sc.SourceContactID = slsc.SourceContactID

    LEFT JOIN customers c ON sc.email = c.email

    LEFT JOIN dbo.UnsubscribedEmails AS ue ON sc.email = ue.email

    WHERE slsc.SourceListID = 59

    AND ue.email IS NULL -- filter out unsubscribed

    AND c.customerID IS NULL -- filter out customers

    Derek

  • I agree, union with Customers table in the view looks suspicious.

    I tried to create the tables and run the queries to look at the execution plan, but I do not have any problems. May be the problem is in the indexes that you have? Post create scripts for you tables, please. Or the execution plan for the second query, may be.

  • Derek Dongray (2/4/2009)


    philcruz (2/4/2009)


    vw_UnsubscribedEmails is

    CREATE VIEW [dbo].[vw_UnsubscribedEmails]

    AS

    SELECT email, dateCreated, source

    FROM dbo.unsubscribedEmails

    UNION

    SELECT email, NULL AS dateCreated, NULL AS source

    FROM dbo.customers

    WHERE (isOnEmailList = 0 AND isEmailRestricted = 1) OR

    (badEmail = 1) OR

    (archiveBit = 1)

    I'm not sure why you are using vw_UnsubscribedEmails which includes some email values from the customers table and then joining the whole customers to remove the rest. Have you tried this?

    SELECT

    sc.SourceContactID

    FROM SourceLists_SourceContacts slsc

    INNER JOIN [sourceContacts] sc ON sc.SourceContactID = slsc.SourceContactID

    LEFT JOIN customers c ON sc.email = c.email

    LEFT JOIN dbo.UnsubscribedEmails AS ue ON sc.email = ue.email

    WHERE slsc.SourceListID = 59

    AND ue.email IS NULL -- filter out unsubscribed

    AND c.customerID IS NULL -- filter out customers

    Well, the idea for the view is that it abstracts who is unsubscribed. So for other people writing queries that need to filter out unsubscribed emails, they don't have to know about the details of that, they just use the view.

    In this particular case, I'm filtering out customers altogether, so doing as you suggest does indeed work much better. So, at this point, I have 2 acceptable solutions: using the subquery or a join not using the view.

    It's just not clear to me why I can run the query using the view but adding c.customerID IS NULL makes it run very long. Unfortunately, I can't even get it to complete so that I can look at the execution plan.

    I appreciate everyone's input on this. (Sorry I couldn't provide sample data but since it concerns customer/email data I can't share it and can't easily generate it).

    -Phil

  • philcruz (2/4/2009)


    Derek Dongray (2/4/2009)


    I appreciate everyone's input on this. (Sorry I couldn't provide sample data but since it concerns customer/email data I can't share it and can't easily generate it).

    -Phil

    I'd still be interested in seeing the full table definitions, indexes and constraints. It's fairly easy to generate some fake email addresses, names, phone numbers (if needed) etc and see if the problem occurs with dummy data as well as the real thing. The only thing then needed would be some statisitcs with rough percentages of each flag value etc.

    Regarding the view, I understand why it's there, but in this case I'd use the table if it fixes the problem and put a comment in/near the SELECT indicating why the view wasn't being used.

    Derek

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply