Indexes on table with more than 20 columns

  • Hi All,

    I have a table (name: Documents) with more than 20 columns, and there are about 2 millions of rows in it.

    And I have to make a search on this table, and user can specify a searching value for each column and set desired field(s) "ALL".

    I tried to optimize each searching query in Query Analyzer and each time it advises to create an index for columns that the user specified value for. Please, advise me what to do with this table as the searching queries are vary. I don't think its useful to create an index for each case.

    Thanks in advance.

    -------------------------
    - Name?
    - Abu Dalah Sarafi.
    - Sex?
    - 3 times a week!
    - No, no. Male or Female?
    - Male, female, sometimes camel...

  • Does your search criteria start with a wildcard? If it does, an index will not help anyway. If your search criteria is typically "starting with" or the entire field, then creating indexes will help the search performance, but 20 of them will probably hurt inserts and updates too much.

    I would suggest you try to determine the most common searches being done and optimize for them. Let the less-common searches be slow. You will have to balance a bit.

    If you are constantly searching "All fields" and your searches all start with wildcards, I would suggest you look into full-text indexing. Using a FT index would also give you much more flexible searches, allow you to ignore noise words, and give you ranking in your results - allowing you to return the "Best" results first.

  • Thanks for reply.

    I have some varchar columns and Full text indexing is set for these columns and I use FREETEXT and CONTAINS for searching in these columns. The others are int and datetime columns. The int columns are checked for equality, date time columns are checked for "less or equal than" or "more or equal then". Do you think I have no chance to cover all search cases? Cos' analyzing any uncovered query Tuning Adviser advise me to create an index with "Estimated Improvement" = 60%-85%.

    Thanks in advance.

    -------------------------
    - Name?
    - Abu Dalah Sarafi.
    - Sex?
    - 3 times a week!
    - No, no. Male or Female?
    - Male, female, sometimes camel...

  • Could you please post the table schema, the current indexes and a couple of sample queries?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • TABLE SCHEME:

    CREATE TABLE [dbo].[Documents](

    [DocumentId] [numeric](18, 0) IDENTITY(1,1) NOT NULL,

    [DocumentTypeId] [numeric](18, 0) NOT NULL,

    [SectionId] [numeric](18, 0) NOT NULL,

    [TemplateId] [numeric](18, 0) NULL,

    [Title] [varchar](200) NULL,

    [Description] [varchar](200) NULL,

    [ContentSource] [varchar](200) NULL,

    [ContentSourcePublicationDate] [datetime] NULL,

    [ContentSubject] [varchar](200) NULL,

    [ContentPublisher] [varchar](200) NULL,

    [ContentLength] [int] NOT NULL CONSTRAINT [DF_Documents_ContentLenght] DEFAULT ((0)),

    [OriginalSource] [varchar](200) NULL,

    [Coverage] [varchar](500) NULL,

    [Author] [varchar](200) NULL,

    [AuthorId] [numeric](18, 0) NULL,

    [Copyright] [varchar](500) NULL,

    [StockSymbols] [varchar](500) NULL,

    [Companies] [varchar](2000) NULL,

    [Keywords] [varchar](500) NULL,

    [HermesArticleType] [varchar](100) NULL,

    [HermesPageReference] [varchar](500) NULL,

    [HermesObjectReference] [varchar](500) NULL,

    [HermesPaperDay] [varchar](100) NULL,

    [EditorId] [numeric](18, 0) NULL,

    [ManagerId] [numeric](18, 0) NULL,

    [Status] [int] NOT NULL,

    [Reason] [int] NOT NULL,

    [BlackListOfFeedTypes] [varchar](200) NULL,

    [LicensingDescription] [varchar](500) NULL,

    [CreationDate] [datetime] NOT NULL CONSTRAINT [DF_Documents_CreationDate] DEFAULT (getdate()),

    [ModificationDate] [datetime] NOT NULL CONSTRAINT [DF_Documents_ModificationDate] DEFAULT (getdate()),

    [DisplayDate] [datetime] NULL,

    [ExpirationDate] [datetime] NULL,

    [XmlVersionTimestamp] [datetime] NULL,

    [RssVersionTimestamp] [datetime] NULL,

    [AccessRights] [varchar](200) NULL,

    [IsTemplate] [bit] NOT NULL,

    [IsDefaultTemplate] [bit] NOT NULL CONSTRAINT [DF_Documents_IsDefaulTemplate] DEFAULT ('False'),

    [TemplateName] [varchar](100) NULL,

    [TemplateDescription] [varchar](500) NULL,

    [Version] [int] NOT NULL CONSTRAINT [DF_Documents_Version] DEFAULT ((1)),

    [IsActive] [bit] NOT NULL CONSTRAINT [DF_Documents_IsActive] DEFAULT ('True'),

    [VersionId] [uniqueidentifier] NOT NULL,

    CONSTRAINT [PK_Documents] PRIMARY KEY CLUSTERED

    (

    [DocumentId] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    Search queries:

    Example #1:

    declare @varIdsXml varchar(max);

    declare @total int;

    with Docs as (

    select d.documentID,

    row_number() over (order by d.Title asc ) as RowNumber

    from Documents d

    where d.IsActive = 1

    and d.IsTemplate = 0

    and d.DocumentTypeId = 3

    and (freetext(d.Author, 'some-text') or freetext(d.ContentPublisher, 'some-text') or freetext(d.ContentSubject, 'some-text') or freetext(d.Copyright, 'some-text') or freetext(d.LicensingDescription, 'some-text') or exists(select top 1 null from DocumentMetadataValues where DocumentId = d.DocumentId and freetext([Value], 'some-text') ))

    and 3 & (select Company from DocumentSections where SectionId = d.SectionId) <> 0

    and d.EditorId = 2

    and d.Status in (2) )

    select @varIdsXml = ( select DocumentId as [Value] from Docs where RowNumber between 1 and 25 for xml raw( 'Id' ), root( 'Ids' ) )

    exec Document_GetByIds @varIdsXml

    Example #2:

    declare @varIdsXml varchar(max);

    declare @total int;

    with Docs as (

    select d.documentID,

    row_number() over (order by d.Title asc ) as RowNumber

    from Documents d

    where d.IsActive = 1

    and d.IsTemplate = 0

    and d.DocumentTypeId = 1

    and 3 & (select Company from DocumentSections where SectionId = d.SectionId) <> 0

    and (select count(*) from dbo.RegexMatches('key', '\w+') where [Text] in (select [Text] from dbo.RegexMatches(isnull(d.Keywords, ''), '\w+'))) > 0

    and d.OriginalSource = 'OrigSour'

    and d.CreationDate >= '03/09/2004' and d.CreationDate < '06/13/2008'

    and HermesArticleType = 'StaffArticle' )

    select @varIdsXml = ( select DocumentId as [Value] from Docs where RowNumber between 1 and 25 for xml raw( 'Id' ), root( 'Ids' ) )

    exec Document_GetByIds @varIdsXml

    -------------------------
    - Name?
    - Abu Dalah Sarafi.
    - Sex?
    - 3 times a week!
    - No, no. Male or Female?
    - Male, female, sometimes camel...

  • Hmmm. It's not an easy indexing scenario.

    Are there any columns or set of columns that are always or most of the time specified in the queries?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • d.IsActive always = 1

    d.IsTemplate always = 0

    d.DocumentTypeId always existing parameter.

    I cannot suppose what parameters will be used more frequently. This project is not in production yet. But I have created a system, that will log the query parameter and will allow me to get statistics for common columns. So I will be able to determine the frequently using columns in v2.

    Thanks

    -------------------------
    - Name?
    - Abu Dalah Sarafi.
    - Sex?
    - 3 times a week!
    - No, no. Male or Female?
    - Male, female, sometimes camel...

  • What % of the table is likely to have IsActive 1 and IsTemplate 0?

    As a first suggestion, consider this:

    CREATE INDEX idx_Documents_Search1 ON Documents

    (DocumentTypeId , IsActive, IsTemplate, Title)

    Title is in the key to support (hopefully) the rownumber function.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail,

    Thanks for your help.

    the IsTemplate = 0 have about 99.3% of rows.

    IsActive = 1 have about 85% of rows.

    You see, the search query is made dynamically, I posted here only the output (the command that is executed at the final step), so the "d.Title" (sorting column) can also vary.

    What do you think, will it be helpful to create an index "DocumentId, IsActive, IsTemplate". Will this increase the speed of execution even by 1%?

    Thank you,

    Artur

    -------------------------
    - Name?
    - Abu Dalah Sarafi.
    - Sex?
    - 3 times a week!
    - No, no. Male or Female?
    - Male, female, sometimes camel...

  • No, because DocumentID is your clustered index and you're not filltering on it anywhere.

    Maybe stick with just DocumentTypeId , IsActive, IsTemplate if the ordering column is going to change. It might be selective enough to be used. no promices though

    Once the system's in use, find the common searchs and index them. You're not going to be able to make every search fast.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail,

    thanks a lot.

    Artur

    PS. nice photo... sitting on the elephant 🙂

    -------------------------
    - Name?
    - Abu Dalah Sarafi.
    - Sex?
    - 3 times a week!
    - No, no. Male or Female?
    - Male, female, sometimes camel...

  • Kargin - this is not so much an indexing thought, more taking a look at the Regex function. You're calling a table-valued CLR function twice, and running aggregates on it, instead of having a RegexISMatch function which would return whether there are matches or not. I'd think you might get better perf if you modeled it to have one single call handling the checking, and not 2 being correlated.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply