Primary XML Index Worse Than No Index

  • Hi Guys,

    I've been working with some tables with XML columns, and I've been playing around with indexing the XML so I can query it directly. Apart from the fact that adding a primary XML index almost doubles the size of the table, I have found that consistently, queries which use xml.exist() or xml.nodes() methods to find matching rows or extract data points it almost invariables INCREASES scans and logical reads, complicates the query plan, and at best, has no negative impact on performance.

    I ran a handful of common operations on some sample data and in none of these cases was performance increased by the presence of a primary xml index.

    Is there some sort of special case where a primary xml index is beneficial that I'm just not seeing?

    /*****************************

    Sample XML

    *****************************/

    if object_id('tempdb.dbo.#Sample') is not null drop table #Sample

    create table #Sample

    (

    RID int identity(1,1) primary key clustered,

    RawXML xml

    )

    ;with nums as

    (

    select top 10000 num = row_number() over (order by (select null))

    from sys.all_objects a, sys.all_objects b

    )

    insert into #Sample (RawXML)

    select '<root><name>'

    + cast(newid() as char(36))

    + '</name>'

    + case when num % 10 = 0 then '<date>' + convert(varchar(50), getdate()) + '</date>'

    else ''

    end

    + '<value>'

    + char((num % 26) + 97)

    + '</value>'

    + '</root>'

    from nums

    /*****************************

    Pre-indexing tests

    *****************************/

    set statistics io on

    -- 1 scan, 201 LR

    select *

    from #Sample

    where RawXML.exist('/root[value = "a"]') = 1

    -- 1 scan, 201 LR

    select *

    from #Sample

    where RawXML.exist('/root/date') = 1

    -- 1 scan, 201 LR

    select t.c.value('name[1]', 'char(36)')

    from #sample a

    cross apply a.RawXML.nodes('/root') t(c)

    create primary xml index PXML__#Sample__RawXML on #Sample (RawXML)

    /*****************************

    Post-indexing tests

    *****************************/

    -- 30000 scans, 101059 LR

    -- 1 Scan, 201, LR

    select *

    from #Sample

    where RawXML.exist('/root[value = "a"]') = 1

    -- 1 scan, 442 LR

    -- 1 Scan, 201 LR

    select *

    from #Sample

    where RawXML.exist('/root/date') = 1

    -- 20001 scans, 60693 LR

    -- 1 scan, 201 LR

    select t.c.value('name[1]', 'char(36)')

    from #sample a

    cross apply a.RawXML.nodes('/root') t(c)

    Executive Junior Cowboy Developer, Esq.[/url]

  • I have run into the exact same thing. I can say that I have had some (limited) success with secondary XML indexes. XML indexes are are kind of a black box to me considering how they SQL Server does not keep statistics for XML columns.

    I have found selective XML indexes to be helpful but that's a SQL 2012 feature.

    I just ran accross this thread which you may find helpful: http://dba.stackexchange.com/questions/52552/index-not-making-execution-faster-and-in-some-cases-is-slowing-down-the-query

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply