Why Is one XML import so much faster

Question

Why Is one XML import so much faster

Edward-445599

Hall of Fame

Points: 3147
More actions
April 24, 2012 at 5:14 am

#267851

Hi, need help with understanding the mechanisms involved in these 2 queries which accomplish the same .... The first query taking hours the second query taking seconds.
What is a happening please expand my knowledge 🙂
EXAMPLE 1 Super slow
SELECT X.product.query('SKU').value('.', 'INT'),
X.product.query('Desc').value('.', 'VARCHAR(30)')
FROM (
SELECT CAST(x AS XML)
FROM OPENROWSET(
BULK 'C:\Products.xml',
SINGLE_BLOB) AS T(x)
) AS T(x)
CROSS APPLY x.nodes('Products/Product') AS X(product);
Example 2 Super Fast
DECLARE @xml_table TABLE(xml_data xml)
DECLARE @sqlstmt as varchar(255)
SET @sqlstmt= 'SELECT * FROM OPENROWSET ( BULK ''C:\Products.xml'', SINGLE_CLOB ) AS XMLDATA'
INSERT INTO @xml_table EXEC (@sqlstmt)
SELECT
x.value('(SKU/text())[1]', 'INT') VTMID
,x.value('(DESC/text())[1]', 'VARCHAR(30)') INVALID
FROM
@xml_table tbl
CROSS APPLY tbl.xml_data.nodes('/Products/Product') e(x)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 1

It could be a lot of things. Those two batches are very different. It may have something to do with the fact that you're importing one as a BLOB and one as a CLOB. It could be the inline-conversion versus the temp storage as XML in table variable. It could be your nodes reference having to traverse more times than necessary in query 1. It could be a combination.

How big is the XML file? Can you attach the file, or at least an XML fragment we can use to replicate a large file on our side? Can you post the actual execution plans for both queries?

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Edward-445599 Hall of Fame Points: 3147 More actions · Answer 2

opc.three (4/24/2012)
It could be a lot of things. Those two batches are very different. It may have something to do with the fact that you're importing one as a BLOB and one as a CLOB. It could be the inline-conversion versus the temp storage as XML in table variable. It could be your nodes reference having to traverse more times than necessary in query 1. It could be a combination.
How big is the XML file? Can you attach the file, or at least an XML fragment we can use to replicate a large file on our side? Can you post the actual execution plans for both queries?

It’s not BLOB or CLOB i have tried it with both and the timings are the same.

The XML file has 40,000 + records.

But also it appears any XML files with a significant amount of data then query 2 is so so much quicker seconds instead of hours.

I have no understanding as to why performance is so much worse with query 1... there is a real gap in my knowledge can anyone help?

Edward-445599 Hall of Fame Points: 3147 More actions · Answer 3

Hi ,

Sorry forgot, Here is some sample XML if you just copy and paste the <product>

entries till you have 30,000 + records you shall see the huge diffrences in timings

<Products>

<Product>

<SKU>1</SKU>

<Desc>Book</Desc>

</Product>

<Product>

<SKU>2</SKU>

<Desc>DVD</Desc>

</Product>

<Product>

<SKU>3</SKU>

<Desc>Video</Desc>

</Product>

</Products>

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 4

Have a look at the execution plans. The use of .query adds looping in the first query which explains why it takes so much longer. Both run the CROSS APPLY filter 3 times, but the use of .query adds 4 XML Reader operators to the first query. As opposed to the second query which just gets the .value using an explicit path, i.e. no query reader operator, just an explicit scalar retrieval. I think this amounts to the number of product nodes in the file being traversed 4 additional times each in the second query.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Edward-445599 Hall of Fame Points: 3147 More actions · Answer 5

Hi thanks for that, massive query plan left me feeling dizzy , but i see what you mean now. Thanks again for you help it's makeing more sense now