How should I index a table for querying full row by two foreign keys?

  • Back in the days of my Oracle database college courses, I always thought indexes were very simple: a separately maintained, ordered list of lookup values pointing back to the location of their row. Since then I've learned of database pages, clustered vs. non-clustered, covering indexes (which I guess actually store a copy of the data) which all seem to complicate indexing from my basic academic understanding. So I'm never sure if I attempt to create an index if I'm actually helping or hurting. I haven't had much luck with execution plans either - I'm lacking a lot of SQL Server prerequisite knowledge for it all, partly because of difficulty finding the right resources, partly because, time hasn't allowed.

    So I thought this simple case I've come across might be a good learning opportunity. I have a table containing multiple discounts for multiple semesters:

    SemesterDiscounts

    pk_SemesterDiscountID uniqueidentifier

    fk_SemesterID uniqueidentifier not null

    fk_DiscountTypeID uniqueidentifier not null

    Discount decimal(9, 2) not null

    fk_FacilityID uniqueidentifier not null

    IsDeleted bit

    Our application may on occasion select a row by its primary key, but more frequently will get it by fk_SemesterID and fk_DiscountTypeID :

    SELECT * FROM SemesterDiscounts WHERE fk_SemesterID = @semesterID AND fk_DiscountTypeID = @discountTypeID

    So it seems like I should add an index on fk_SemesterID, fk_DiscountTypeID. Not sure if asc/desc matters. But is it that all there is to it?

    Now for an extra level of complexity, most of our tables have the last two columns, fk_FacilityID, and IsDeleted. This is a multi-tenant database and we are using an RLS policy to partition it and as sort of a convenient hack, it also filters out soft deleted records (IsDeleted = 1). So every query also reduces the set on these columns as well. Should every table then have an index on fk_FacilityID and IsDeleted? Or in the case above would I need a single index on fk_FacilityID, IsDeleted, fk_SemesterID, and fk_DiscountTypeID?

    Clearly, I'm not the best person to be doing this, but the alternative could eventually be someone running some tool to automatically script out a bunch of indexes that it thinks are optimal, which I'm not sure is better. Thanks!

    • This topic was modified 3 years, 8 months ago by  xr280xr.
  • xr280xr - Friday, August 31, 2018 9:02 AM

    Back in the days of my Oracle database college courses, I always thought indexes were very simple: a separately maintained, ordered list of lookup values pointing back to the location of their row. Since then I've learned of database pages, clustered vs. non-clustered, covering indexes (which I guess actually store a copy of the data) which all seem to complicate indexing from my basic academic understanding. So I'm never sure if I attempt to create an index if I'm actually helping or hurting. I haven't had much luck with execution plans either - I'm lacking a lot of SQL Server prerequisite knowledge for it all, partly because of difficulty finding the right resources, partly because, time hasn't allowed.

    So I thought this simple case I've come across might be a good learning opportunity. I have a table containing multiple discounts for multiple semesters:

    SemesterDiscounts
    pk_SemesterDiscountID uniqueidentifier
    fk_SemesterID uniqueidentifier not null
    fk_DiscountTypeID uniqueidentifier not null
    Discount decimal(9, 2) not null
    fk_FacilityID uniqueidentifier not null
    IsDeleted bit

    Our application may on occasion select a row by its primary key, but more frequently will get it by fk_SemesterID and fk_DiscountTypeID :

    SELECT * FROM SemesterDiscounts WHERE fk_SemesterID = @semesterID AND fk_DiscountTypeID = @discountTypeID

    So it seems like I should add an index on fk_SemesterID, fk_DiscountTypeID. Not sure if asc/desc matters. But is it that all there is to it?

    Now for an extra level of complexity, most of our tables have the last two columns, fk_FacilityID, and IsDeleted. This is a multi-tenant database and we are using an RLS policy to partition it and as sort of a convenient hack, it also filters out soft deleted records (IsDeleted = 1). So every query also reduces the set on these columns as well. Should every table then have an index on fk_FacilityID and IsDeleted? Or in the case above would I need a single index on fk_FacilityID, IsDeleted, fk_SemesterID, and fk_DiscountTypeID?

    Clearly, I'm not the best person to be doing this, but unfortunately, the alternative is going to be my boss eventually running some tool to automatically script out a bunch of indexex that it thinks are optimal, which I'm not sure is better. Thanks!

    An index like this should help
    CREATE INDEX IX_SemesterDiscounts_1 ON SemesterDiscounts(fk_SemesterID,fk_DiscountTypeID)
    The first column in the index should be the most selective. To work out which is most selective run these queries and the one with the highest value should be the leading column in the index

    SELECT COUNT(DISTINCT fk_SemesterID) FROM SemesterDiscounts
    SELECT COUNT(DISTINCT fk_DiscountTypeID) FROM SemesterDiscounts

    (that's for the initial query you provided).
    If you also filter on IsDeleted=0 then you can add a WHERE clause to the index:
    CREATE INDEX IX_SemesterDiscounts_1 ON SemesterDiscounts(fk_SemesterID,fk_DiscountTypeID)
    WHERE IsDeleted=0

  • Great! Thank you. That all makes sense. The norm is for my RLS policy's filter function is to filter on facilityID and IsDeleted=0, but either of those can be optionally bypassed for special cases, so I'm not sure I would want to restrict the index with the WHERE clause. But I do want to help RLS narrow down the table by facilityID and IsDeleted as efficiently as possible. This is the filter predicate function it uses on any table with an fk_FacilityID or IsDeleted column:


    ALTER FUNCTION [rls].[fn_FacilityIDFilterPredicate]
    (@facilityID AS uniqueidentifier = null, @isDeleted as bit = null)
    RETURNS TABLE
    WITH SCHEMABINDING
    AS
        RETURN
            SELECT    1 as fn_FacilityIDPredicateResult
            WHERE    (
                        (@facilityID IS NULL OR @facilityID = CAST(SESSION_CONTEXT(N'FacilityID') AS uniqueidentifier))
                        AND
                        (@isDeleted IS NULL OR (@isDeleted & CAST(SESSION_CONTEXT(N'FilterDeleted') AS BIT) ^ 1) = 1) -- a & b ^ 1 = nand (operators have equal order of precedence)
                    )
                    OR
                    (IS_ROLEMEMBER('db_owner')=1 OR IS_MEMBER('BUILTIN\administrators')=1 OR CURRENT_USER = 'CRMAdminUser')

    So if I were to do this

    CREATE INDEX IX_SemesterDiscounts_1 ON SemesterDiscounts(fk_FacilityID,IsDeleted)
    CREATE INDEX IX_SemesterDiscounts_2 ON SemesterDiscounts(fk_SemesterID,fk_DiscountTypeID)

    and then run my original query:

    SELECT * FROM SemesterDiscounts WHERE fk_SemesterID = @semesterID AND fk_DiscountTypeID = @discountTypeID

    which will use fn_FacilityIDPredicateResult, would it use (and benefit from) IX_SemesterDiscounts_1 for the filter predicate part and IX_SemesterDiscounts_2 for the rest of the query? Because most of the tables in my database use this function, so they will likely all need a similar index for fk_FacilityID and IsDeleted too.

  • Your initial understanding of indexes from an Oracle perspective is correct.  Oracle doesn't store data the same way SQL Server does, it stores a ROWID in the indexes that points to a physical location in a file on a disk, so it's closer to what SQL Server calls a HEAP.

    For solving your question, following Jonathan's advice about finding the most selective column of the two is the best way to go.  The only caveat is that since your query has a SELECT * in it, the database engine may decide to use the CLUSTERED index anyway instead of the NONCLUSTERED index depending on how many rows it thinks will get returned for the given @SemesterID parameter.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply