Are Joins faster in a query in OLEDB Src or should I use a lookup

  • I have some views that I would like to write an ssis pkg to use the queries of the views to load tables so that users will have better performance querying tables rather than a view. I have a warehouse that gets refreshed every morning and I will start this pkg after the refresh completes. I'm wondering if it would be faster to use a lookup or some other transform to do the join in an ssis pkg may be faster.

    Also, Would it be possible to restrict the number of records being joined by adding modular arithmetic in the AND clause of the join ex

    Table_A a Inner join Table_B b ON a.primeKey = b.ForiegnKey_A and (a.primeKey % 3 = 0)

  • I tried it both ways, one with doing the joins on the server in the query and another where I used lookups to pull only the columns needed in the query for the lookup. The lookups were 15 minutes faster. The main table had 30 million records. I also restricted the number of colimns being brought in through the main tables and used the query Hint OPTION(MAXDOP 1) and broke the main table into 3 pipelines using (PrimeKey % 3) = 0, (PrimeKey % 3) = 1, and (PrimeKey % 3) = 2 in the where clauses

  • dndaughtery (3/1/2011)


    I tried it both ways, one with doing the joins on the server in the query and another where I used lookups to pull only the columns needed in the query for the lookup. The lookups were 15 minutes faster. The main table had 30 million records. I also restricted the number of colimns being brought in through the main tables and used the query Hint OPTION(MAXDOP 1) and broke the main table into 3 pipelines using (PrimeKey % 3) = 0, (PrimeKey % 3) = 1, and (PrimeKey % 3) = 2 in the where clauses

    What are the size(s) of the tables you're using in the lookup vs. the JOIN, and have you looked at the execution plans of the JOIN'd query itself? I've found that lookups are good when you've got disparate data sources for the stream (two separate servers, flatfile into database, etc.), but the JOIN shouldn't be lagging that badly behind unless indexing is not built to handle the process.... but then, how long is this process? Is 15 minutes half the time, or a 5% difference? 🙂

    Restricting the columns should almost always speed things up a bit, as you won't load down memory more then you need. You should also do this in your lookups, as they pre-cache and chew on memory while the entire process runs.

    I've never arbitrarily split a stream, so I can't hazard a comment on that without some testing of my own.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Using the lookups took 31 min 15 seconds. Using the joins on the server took 45 minutes.

    The table I split into 3 pipelines contained 31,413,234 records. I did restrict to only the columns I needed from this table.

    There were 6 other tables that were needing to be joined with the following count of records

    Tbl1 - 1,185,884 records

    Tbl2 - 202 records

    Tbl3 - 2572 records

    Tbl4 - 1652 Records

    Tbl5 - 602 Records

    Tbl6 - 146 records

    no I didn't look into the execution plan of the joins. Haven't really had much exposure to doing that.....but will be reading over it tonight

  • dndaughtery (3/1/2011)


    Using the lookups took 31 min 15 seconds. Using the joins on the server took 45 minutes.

    Hm, 1/3 time reduction, very nice.

    Tbl1 - 1,185,884 records

    Tbl2 - 202 records

    Tbl3 - 2572 records

    Tbl4 - 1652 Records

    Tbl5 - 602 Records

    Tbl6 - 146 records

    Tbls 2-6 are certainly good lookup candidates, but my guess is they weren't slowing down the join method, either. Tbl1 is pretty hefty to shove into a lookup object, from my limited experience, so hopefully you are only requiring a lightweight column width out of it. That'd be where I'd concentrate my join mechanics.

    no I didn't look into the execution plan of the joins. Haven't really had much exposure to doing that.....but will be reading over it tonight

    If you can grab the .sqlplan (check the Index/Tuning link in my signature if you need help), we can help you read it and find the problems. Preference would be for the actual plan, but the estimated can give some clues.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Execution plan attached

  • dndaughtery (3/1/2011)


    Execution plan attached

    Couple of things...

    Any chance you can post the DDL and indexes of fact_transac? It looks like a missing index might help you out there.

    This is the index it would like:

    CREATE NONCLUSTERED INDEX idx_fact_transac_ssc_test ON fact_transac (TrnPayAmt)

    INCLUDE (TrnPk, TrnPatientKey, TrnUnits, TrnPayCodeID, TrnAttDocID, TrnLocID, TrnPostPdID, TrnInvNum, TrnFscAtPaymentID) ON [PRIMARY]

    The where clause on your query is non-SARGable.

    WHERE (ISNULL(FT.TrnPayAmt, 0) <> 0)

    Might I recommend modifying this to FT.TrnPayAmt <> 0? First, you don't want NULLs anyway, you're turning them into 0 and excluding them, so the inequality just to 0 will do the same. Second, by removing the function wrapping the column, you'll allow for better index usage.

    Lastly, is the DISTINCT necessary here? It's a painful operation from the execution plan (it's why there's that final sort).


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • USE [GECBI]

    GO

    /****** Object: Table [dbo].[fact_tes_transac] Script Date: 03/01/2011 16:20:52 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[fact_tes_transac](

    [FttPk] [int] IDENTITY(1,1) NOT NULL,

    [FttMRN] [varchar](20) NULL,

    [FttPatientName] [varchar](60) NULL,

    [FttEncounterNumber] [int] NULL,

    [FttTESBatchNumber] [varchar](10) NULL,

    [FttTransactionNum] [numeric](12, 0) NOT NULL,

    [FttCptFk] [int] NULL,

    [Fttmodifiers] [varchar](30) NULL,

    [Ftttxnoccurrence] [numeric](10, 0) NOT NULL,

    [Fttdn33062Fk] [int] NULL,

    [FttCreationDateFk] [int] NULL,

    [FttLastModifiedDateFk] [int] NULL,

    [FttDiagCdOneFk] [int] NULL,

    [FttDiagCdTwoFk] [int] NULL,

    [FttDiagCdThreeFk] [int] NULL,

    [FttDiagCdFourFk] [int] NULL,

    [FttDiagCdFiveFk] [int] NULL,

    [FttDiagCdSixFk] [int] NULL,

    [FttUnits] [int] NULL,

    [FttServiceDateFk] [int] NULL,

    [FttProviderFk] [int] NULL,

    [FttBillingAreaFk] [int] NULL,

    [FttHospFk] [int] NULL,

    [FttDivFk] [int] NULL,

    [FttLocFk] [int] NULL,

    [FttFscFk] [int] NULL,

    [FttDn33054TargetFk] [int] NULL,

    [FttDn33054OverrideFk] [int] NULL,

    [FttDiscPer] [varchar](4) NULL,

    [FttDiscType] [int] NULL,

    [FttOrigCharge] [money] NULL,

    [FttAdjustedCharge] [money] NULL,

    [FttCurrentCharge] [money] NULL,

    [FttDeletedCharge] [money] NULL,

    [FttExtractedCharge] [money] NULL,

    [FttTnComment] [varchar](60) NULL,

    [FttDn33053Fk] [int] NULL,

    [FttDeleteComment] [varchar](80) NULL,

    [FttForceExtractDateFk] [int] NULL,

    [FttDn33055Fk] [int] NULL,

    [FttForceExtractComment] [varchar](80) NULL,

    [FttBARBatchNum] [varchar](10) NULL,

    [FttBarInvNum] [varchar](10) NULL,

    [FttDebitAmount] [money] NULL,

    [FttSecondAnesthes] [int] NULL,

    [FttSecondAnesthesTimeIn] [varchar](8) NULL,

    [FttSecondAnesthesTimeOut] [varchar](8) NULL,

    [FttSecondCrna] [int] NULL,

    [FttSecondCrnaTimeIn] [varchar](8) NULL,

    [FttSecondCrnaTimeOut] [varchar](8) NULL,

    [FttSecondayProvider] [int] NULL,

    [FttAnesthes] [int] NULL,

    [FttCrna] [int] NULL,

    [FttTrnPkFk] [int] NULL,

    [FttFtiPkFk] [int] NULL,

    [FttDn722Fk] [int] NULL,

    [FTTModFk] [int] NULL,

    [FttDn33031Create] [int] NULL,

    [FttDn33031LastMod] [int] NULL,

    [FttDn33055Comment] [varchar](80) NULL,

    [FttDiagCdSevenFk] [int] NULL,

    [FttDiagCdEightFk] [int] NULL,

    [FttDiagCdNineFk] [int] NULL,

    [FttDiagCdTenFk] [int] NULL,

    [FttPatAgeAtServDays] [int] NULL,

    [FttChargeProcCode] [varchar](10) NULL,

    [FttChgExtId] [varchar](80) NULL,

    [FttRefProviderFk] [int] NULL,

    [FttUBillingProviderPk] [int] NULL,

    CONSTRAINT [PK_fact_tes_transac] PRIMARY KEY CLUSTERED

    (

    [FttPk] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

  • :blink: You've only got the one index on the fact table, when she's that wide? Owch.

    CONSTRAINT [PK_fact_tes_transac] PRIMARY KEY CLUSTERED

    Definately try adding in the NC index I mentioned above, after a double check that you've got indexes being issued (in Tools-Options) as well when you script your objects in SSMS.

    Wrong table, at least according to the execution plan, we need GECBI.dbo.fact_transac.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • oops, my error

    USE [GECBI]

    GO

    /****** Object: Table [dbo].[fact_transac] Script Date: 03/01/2011 16:30:08 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[fact_transac](

    [TrnPk] [int] IDENTITY(1,1) NOT NULL,

    [TrnFKInvPk] [int] NOT NULL,

    [TrnGroupId] [int] NOT NULL,

    [TrnPatientKey] [numeric](12, 0) NOT NULL,

    [TrnKey] [numeric](12, 0) NOT NULL,

    [TrnDebitAmt] [money] NOT NULL,

    [TrnChargeAmt] [money] NOT NULL,

    [TrnPayAmt] [money] NOT NULL,

    [TrnCreditAmt] [money] NOT NULL,

    [TrnAdjAmt] [money] NOT NULL,

    [TrnTotalPmt] [money] NULL,

    [TrnUnits] [numeric](18, 2) NULL,

    [TrnUnitsBase] [int] NULL,

    [TrnUnitsTime] [numeric](10, 2) NULL,

    [TrnBatchNum] [int] NULL,

    [TrnOverrideFlag] [int] NULL,

    [TrnFscId] [int] NOT NULL,

    [TrnPayCodeId] [int] NULL,

    [TrnPrevFscId] [int] NOT NULL,

    [TrnCptId] [int] NULL,

    [TrnPostDtId] [int] NULL,

    [TrnPostFieldDtid] [int] NULL,

    [TrnServiceDtId] [int] NULL,

    [TrnThruServiceDtId] [int] NULL,

    [TrnBankDepDtId] [int] NULL,

    [TrnGELoadDt] [datetime] NOT NULL,

    [TrnLoadDt] [datetime] NOT NULL,

    [TrnAttDocId] [int] NOT NULL,

    [TrnRefDocId] [int] NOT NULL,

    [TrnDivId] [int] NOT NULL,

    [TrnLocId] [int] NOT NULL,

    [TrnBillingAreaId] [int] NOT NULL,

    [TrnFeeSchId] [int] NOT NULL,

    [TrnHospId] [int] NOT NULL,

    [TrnAgeId] [int] NOT NULL,

    [TrnFKPtPk] [int] NULL,

    [TrnPostPdID] [int] NULL,

    [TrnRvu] [numeric](18, 2) NULL,

    [TrnLstPmtDt] [int] NULL,

    [TrnEncounter] [int] NULL,

    [TrnInvNum] [int] NULL,

    [TrnBatchStatus] [int] NULL,

    [TrnTotalAdj] [numeric](18, 2) NULL,

    [TrnBalance] [numeric](18, 2) NULL,

    [TrnPmtCnt] [int] NULL,

    [TrnInsCompId] [int] NULL,

    [TrnPatAgeAtServDays] [int] NULL,

    [TrnPatAgeAtServWeeks] [int] NULL,

    [TrnPatAgeAtServMonths] [int] NULL,

    [TrnPatAgeAtServYears] [int] NULL,

    [TrnFscAtPaymentId] [int] NULL,

    [TrnFscAtPaymentSeq] [int] NULL,

    [TrnFkBtcPk] [int] NULL,

    [TrnFrontDeskPaycodeId] [int] NULL,

    CONSTRAINT [PK_fact_transac] PRIMARY KEY CLUSTERED

    (

    [TrnPk] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [pxptransac]

    ) ON [pxptransac]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_BillingArea] FOREIGN KEY([TrnBillingAreaId])

    REFERENCES [dbo].[dim_BillingArea] ([BillingAreaPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_BillingArea]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_cpt] FOREIGN KEY([TrnCptId])

    REFERENCES [dbo].[dim_cpt] ([CptPK])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_cpt]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_date] FOREIGN KEY([TrnPostDtId])

    REFERENCES [dbo].[dim_date] ([date_id])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_date]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_date1] FOREIGN KEY([TrnPostFieldDtid])

    REFERENCES [dbo].[dim_date] ([date_id])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_date1]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_date2] FOREIGN KEY([TrnServiceDtId])

    REFERENCES [dbo].[dim_date] ([date_id])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_date2]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_date3] FOREIGN KEY([TrnThruServiceDtId])

    REFERENCES [dbo].[dim_date] ([date_id])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_date3]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_date4] FOREIGN KEY([TrnBankDepDtId])

    REFERENCES [dbo].[dim_date] ([date_id])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_date4]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_division] FOREIGN KEY([TrnDivId])

    REFERENCES [dbo].[dim_division] ([DivPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_division]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_fsc] FOREIGN KEY([TrnFscId])

    REFERENCES [dbo].[dim_fsc] ([FscPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_fsc]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_fsc1] FOREIGN KEY([TrnPrevFscId])

    REFERENCES [dbo].[dim_fsc] ([FscPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_fsc1]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_group] FOREIGN KEY([TrnGroupId])

    REFERENCES [dbo].[dim_group] ([GroupPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_group]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_hospital] FOREIGN KEY([TrnHospId])

    REFERENCES [dbo].[dim_hospital] ([HospPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_hospital]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_location] FOREIGN KEY([TrnLocId])

    REFERENCES [dbo].[dim_location] ([LocPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_location]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_patient] FOREIGN KEY([TrnFKPtPk])

    REFERENCES [dbo].[dim_patient] ([PtPK])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] NOCHECK CONSTRAINT [FK_fact_transac_dim_patient]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_PayCode] FOREIGN KEY([TrnPayCodeId])

    REFERENCES [dbo].[dim_PayCode] ([PayCodePk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_PayCode]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_provider] FOREIGN KEY([TrnAttDocId])

    REFERENCES [dbo].[dim_provider] ([ProviderPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_provider]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_dim_RefProvider] FOREIGN KEY([TrnRefDocId])

    REFERENCES [dbo].[dim_RefProvider] ([RefProviderPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_dim_RefProvider]

    GO

    ALTER TABLE [dbo].[fact_transac] WITH NOCHECK ADD CONSTRAINT [FK_fact_transac_fact_invoice] FOREIGN KEY([TrnFKInvPk])

    REFERENCES [dbo].[fact_invoice] ([InvPk])

    NOT FOR REPLICATION

    GO

    ALTER TABLE [dbo].[fact_transac] CHECK CONSTRAINT [FK_fact_transac_fact_invoice]

    GO

  • Okay, roughly the same comment, just with Foreign Keys added:

    You've only got the one index on the fact table, owch. That table's roughly 250-300 bytes wide, and this is the only real lookup mechanism unless you come in via a foreign key:

    CONSTRAINT [PK_fact_transac] PRIMARY KEY CLUSTERED

    Definately try adding in the NC index and modification to the where clause I mentioned above. Also: [TrnPayAmt] [money] NOT NULL

    You don't need null controls on a NOT NULL column. 🙂 Do you ever have negative TrnPayAmt? If not, swap it to a TrnPayAmt > 0


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • AWESOME! After adding the index the join on the server only took 17 minutes! I will defintely be looking into possible indexes to add.

  • I am now looking into adding indexes for several other queries. Should I follow what you did in the index you posted earlier? Just add the keys used in the joins in the "on" clause?

  • dndaughtery (3/2/2011)


    I am now looking into adding indexes for several other queries. Should I follow what you did in the index you posted earlier? Just add the keys used in the joins in the "on" clause?

    Well, in the query earlier, I only included the column from your where clause and then included the columns the query would actually need. And to be really honest about it, I stole it straight from the 'missing indexes' section of your execution plan. :Whistling:

    However, yes, as a general practice if you're looking to speed up a query and you're not heavily concerned about insert/update/delete speeds being adversely affected, you would start by building your index with the indexed columns being in the ON clause, and INCLUDING any columns returned by the select. You'll want to make sure when you do this that you try to include the first columns (also known as the leading edge) as the one you're going to locate data by the most often.

    Example: RowDate, CustID, InvoiceID. If you usually go to a record via the InvoiceID, this Index wouldn't be used because it's expecting two other columns to be used for searching first. In that case, I would build the index as InvoiceID, RowDate, CustID. That's more churn to the index but more queries would be able to use it well.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply