Why Is one XML import so much faster

  • Hi, need help with understanding the mechanisms involved in these 2 queries which accomplish the same .... The first query taking hours the second query taking seconds.

    What is a happening please expand my knowledge πŸ™‚

    EXAMPLE 1 Super slow

    SELECT X.product.query('SKU').value('.', 'INT'),

    X.product.query('Desc').value('.', 'VARCHAR(30)')

    FROM (

    SELECT CAST(x AS XML)

    FROM OPENROWSET(

    BULK 'C:\Products.xml',

    SINGLE_BLOB) AS T(x)

    ) AS T(x)

    CROSS APPLY x.nodes('Products/Product') AS X(product);

    Example 2 Super Fast

    DECLARE @xml_table TABLE(xml_data xml)

    DECLARE @sqlstmt as varchar(255)

    SET @sqlstmt= 'SELECT * FROM OPENROWSET ( BULK ''C:\Products.xml'', SINGLE_CLOB ) AS XMLDATA'

    INSERT INTO @xml_table EXEC (@sqlstmt)

    SELECT

    x.value('(SKU/text())[1]', 'INT') VTMID

    ,x.value('(DESC/text())[1]', 'VARCHAR(30)') INVALID

    FROM

    @xml_table tbl

    CROSS APPLY tbl.xml_data.nodes('/Products/Product') e(x)

  • It could be a lot of things. Those two batches are very different. It may have something to do with the fact that you're importing one as a BLOB and one as a CLOB. It could be the inline-conversion versus the temp storage as XML in table variable. It could be your nodes reference having to traverse more times than necessary in query 1. It could be a combination.

    How big is the XML file? Can you attach the file, or at least an XML fragment we can use to replicate a large file on our side? Can you post the actual execution plans for both queries?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (4/24/2012)


    It could be a lot of things. Those two batches are very different. It may have something to do with the fact that you're importing one as a BLOB and one as a CLOB. It could be the inline-conversion versus the temp storage as XML in table variable. It could be your nodes reference having to traverse more times than necessary in query 1. It could be a combination.

    How big is the XML file? Can you attach the file, or at least an XML fragment we can use to replicate a large file on our side? Can you post the actual execution plans for both queries?

    It’s not BLOB or CLOB i have tried it with both and the timings are the same.

    The XML file has 40,000 + records.

    But also it appears any XML files with a significant amount of data then query 2 is so so much quicker seconds instead of hours.

    I have no understanding as to why performance is so much worse with query 1... there is a real gap in my knowledge can anyone help?

  • Hi ,

    Sorry forgot, Here is some sample XML if you just copy and paste the <product>

    entries till you have 30,000 + records you shall see the huge diffrences in timings

    <Products>

    <Product>

    <SKU>1</SKU>

    <Desc>Book</Desc>

    </Product>

    <Product>

    <SKU>2</SKU>

    <Desc>DVD</Desc>

    </Product>

    <Product>

    <SKU>3</SKU>

    <Desc>Video</Desc>

    </Product>

    </Products>

  • Have a look at the execution plans. The use of .query adds looping in the first query which explains why it takes so much longer. Both run the CROSS APPLY filter 3 times, but the use of .query adds 4 XML Reader operators to the first query. As opposed to the second query which just gets the .value using an explicit path, i.e. no query reader operator, just an explicit scalar retrieval. I think this amounts to the number of product nodes in the file being traversed 4 additional times each in the second query.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi thanks for that, massive query plan left me feeling dizzy , but i see what you mean now. Thanks again for you help it's makeing more sense now

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply