Read XML file

  • Hi,

    i am trying to read a XML file using select statement. The sample has only one row by i could have multiple employees and multiple permits for each employee. Thanks for your help!

    My sample xml file:

    <EXPORT>

    <Employee>

    <TYPE>I</TYPE>

    <FIRSTNAME>JASON</FIRSTNAME>

    <MIDDLENAME/>

    <LASTNAME>SMITKA</LASTNAME>

    <COSTCENTER/>

    <LIST>

    <PERMIT>

    <STATEPERMIT>FL</STATEPERMIT>

    <RESIDENCY>N</RESIDENCY>

    <STATEPERMITTYPE>Agent</STATEPERMITTYPE>

    <EXPIRATIONDATE/>

    <EFFECTIVEDATE>03/02/2001</EFFECTIVEDATE>

    <LOALIST>

    <LOA>Health</LOA>

    <LOA>Life & Health</LOA>

    <LOA>Life, Health & Variable Ann.</LOA>

    <LOA>life</LOA>

    </LOALIST>

    <PERMITSTATUS>ACT</PERMITSTATUS>

    <APPOINTMENTSTATUS>N</APPOINTMENTSTATUS>

    </PERMIT>

    </LIST>

    </Employee>

    </EXPORT>

    The output of the select statement should look like:

    CREATE TABLE #Sample

    (

    [Type] CHAR(1)

    , FirstName VARCHAR(30)

    , MiddleName VARCHAR(30)

    , LastName VARCHAR(30)

    , CostCenter CHAR(3)

    , StatePermit VARCHAR(30)

    , Residency VARCHAR(30)

    , StatePermitType VARCHAR(30)

    , ExpirationDate DATETIME

    , EffectiveDate DATETIME

    , LOA VARCHAR(100)

    , PermitStatus VARCHAR(30)

    , AppointmentStatus VARCHAR(30)

    )

    INSERT INTO #Sample

    SELECT

    'I', 'JASON',NULL,'SMITH',NULL,'PA','N','AGENT',NULL,'03/02/2001'

    ,'Health - Life - Health - Life - Health - Variable Ann - Life' -- concatnated LOA tag

    ,'ACT', 'N'

    SELECT * FROM #Sample

  • Any ideas on how to write tsql do this please!

    Thanks,

  • Hi UnionAll

    You need to use the XML type functions of SQL Server. Here's a little sample, based on your data (removed the "&" characters to get a working source XML):

    DECLARE @xml XML;

    SET @xml = '

    <EXPORT>

    <Employee>

    <TYPE>I</TYPE>

    <FIRSTNAME>JASON</FIRSTNAME>

    <MIDDLENAME/>

    <LASTNAME>SMITKA</LASTNAME>

    <COSTCENTER/>

    <LIST>

    <PERMIT>

    <STATEPERMIT>FL</STATEPERMIT>

    <RESIDENCY>N</RESIDENCY>

    <STATEPERMITTYPE>Agent</STATEPERMITTYPE>

    <EXPIRATIONDATE/>

    <EFFECTIVEDATE>03/02/2001</EFFECTIVEDATE>

    <LOALIST>

    <LOA>Health</LOA>

    <LOA>Life Health</LOA>

    <LOA>Life, Health Variable Ann.</LOA>

    <LOA>life</LOA>

    </LOALIST>

    <PERMITSTATUS>ACT</PERMITSTATUS>

    <APPOINTMENTSTATUS>N</APPOINTMENTSTATUS>

    </PERMIT>

    </LIST>

    </Employee>

    </EXPORT>'

    SELECT

    t.c.value('STATEPERMIT[1]', 'varchar(10)')

    ,t.c.value('../../TYPE[1]', 'varchar(10)')

    ,t.c.value('../../FIRSTNAME[1]', 'varchar(10)')

    FROM @xml.nodes('/EXPORT/Employee/LIST/PERMIT') AS t(c)

    Greets

    Flo

  • To expand on Flo's solution, performance is usually much better when using the CROSS APPLY approach if there's the need to return values from different node levels.

    SELECT

    u.v.value('STATEPERMIT[1]', 'varchar(10)')

    ,t.c.value('TYPE[1]', 'varchar(10)')

    ,t.c.value('FIRSTNAME[1]', 'varchar(10)')

    FROM @xml.nodes('EXPORT/Employee') AS t(c)

    CROSS APPLY t.c.nodes('LIST/PERMIT') AS u(v)

    Personal note: Welcome back, Flo!!!!! Did you enjoy you SQL Server sabbatical year? 😉



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hey Lutz!

    First, thanks for the heads up with CROSS APPLY, sure much better solution.

    [off topic]

    Thanks mate! To be honest, nope. I've got way to less SQL stuff in my current project.. I really hope this will be different in next.

    Nevertheless, it's good to see you guys 'n gals 🙂

    [/off topic]

    Greets

    Flo

  • Thank you both for the help! This is perfect.

  • Here are a few more tricks related to reading xml.

    On top of Lutz' suggestion to use cross apply to traverse the nested node elements, I can add to also retrieve the elements for that element in an additional cross apply. I can not explain it, nor can I find any documentation to support it, but from experience I found that on bigger files this improves processing speed tremendously.

    If your xml input permits it, I suggest to use typed xml instead of untyped xml. It is generally faster processed, allows for a lot more checks on the input and allows you to leave off the stupid [1] on all of the calls to value() when retrieving elements.

    Then dates, these are a pain in the b*t if you don't know how to deal with them. For example, the date you put in the example is not in valid xml format (xml allows only this format in date fields: "2001-03-02T00:00:00") . You will run into big problems trying to read the value in your example unless you take precautions. One possible solution, if your file actually contains this sort of values, is shown below. Better is of course to make sure your xml complies with xml definitions.

    NULL values. Again, these will give you major problems if you don't know how to do it properly. To be exact, your example xml does not contain any NULL values. It contains elements for which no value was specified. Without being explicit with them, it depends on interpretation if the value for such an element is a null value or not. Have a look at the this statement:

    select [EFFECTIVEDATE]

    from (

    select {d '2001-03-02'} as [EFFECTIVEDATE]

    union all select null

    ) t

    for xml path('TEST'), type, elements xsinil

    This is what it produces:

    <TEST xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <EFFECTIVEDATE>2001-03-02T00:00:00</EFFECTIVEDATE>

    </TEST>

    <TEST xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <EFFECTIVEDATE xsi:nil="true" />

    </TEST>

    This shows (next to the proper date format) the only way how to unambiguously represent a null value in an xml element. You need this specific namespace "http://www.w3.org/2001/XMLSchema-instance" in your xml, with that you can add the attribute xsi:nil, which has by definition a value of "true". Normally, the value() function will only return you null for an element when the requested element is not available. If it exists, but has no value in it, instead it will return a default value, which value depends on the requested type. Only by using the xsi:nil="true" attribute you can retrieve a value from an element and be sure it was intended to be null and not forgotten (or intentionally left off). I've written an article on this topic[/url] some time ago that shows a way how to do it. Many work-arounds have been written for this default value, mostly used is nullif( value()-call, <default value>), but this is lengthy and requires the proper default value to be encoded in your query.

    A partial xml can be returned as a single column. This is demonstrated with the LOALIST element. The value returned can hold any number of LOA elements per permit.

    Another example with the LOALIST is to concatenate all textual entries in the list into a comma separated string and return it like that.

    And lastly, assuming the LOALIST holds a subset of predefined values, these values can be put into separate columns, based on their existence in the list. I like to use a cross tab for this (thanks Jeff).

    DECLARE @xml XML;

    SET @xml = '

    <EXPORT>

    <Employee>

    <TYPE>I</TYPE>

    <FIRSTNAME>JASON</FIRSTNAME>

    <MIDDLENAME/>

    <LASTNAME>SMITKA</LASTNAME>

    <COSTCENTER/>

    <LIST>

    <PERMIT>

    <STATEPERMIT>FL</STATEPERMIT>

    <RESIDENCY>N</RESIDENCY>

    <STATEPERMITTYPE>Agent</STATEPERMITTYPE>

    <EXPIRATIONDATE/>

    <EFFECTIVEDATE>03/02/2001</EFFECTIVEDATE>

    <LOALIST>

    <LOA>Health</LOA>

    <LOA>Life Health</LOA>

    <LOA>Life, Health Variable Ann.</LOA>

    <LOA>life</LOA>

    </LOALIST>

    <PERMITSTATUS>ACT</PERMITSTATUS>

    <APPOINTMENTSTATUS>N</APPOINTMENTSTATUS>

    </PERMIT>

    </LIST>

    </Employee>

    </EXPORT>';

    SELECT xe.*

    ,xp.*

    ,xLOA.*

    FROM @xml.nodes('EXPORT/Employee') AS t(e)

    CROSS APPLY (

    SELECT t.e.value('TYPE[1]', 'varchar(10)') AS [TYPE]

    ,t.e.value('FIRSTNAME[1]', 'varchar(100)') AS [FIRSTNAME]

    ,t.e.value('MIDDLENAME[1]', 'varchar(100)') AS [MIDDLENAME]

    ,t.e.value('LASTNAME[1]', 'varchar(100)') AS [LASTNAME]

    ,nullif(t.e.value('COSTCENTER[1]', 'varchar(10)'), '') AS [COSTCENTER]

    ) xe

    CROSS APPLY t.e.nodes('LIST/PERMIT') AS u(p)

    CROSS APPLY (

    SELECT u.p.value('STATEPERMIT[1]', 'varchar(10)') AS [STATEPERMIT]

    ,u.p.value('RECIDENCY[1]', 'char(1)') AS [RECIDENCY]

    ,u.p.value('STATEPERMITTYPE[1]', 'varchar(10)') AS [STATEPERMITTYPE]

    ,nullif(u.p.value('EXPIRATIONDATE[1]', 'datetime2'), {d '1900-01-01'}) AS [EXPIRATIONDATE]

    ,nullif(u.p.value('EFFECTIVEDATE[1]', 'datetime2'), {d '1900-01-01'}) AS [EFFECTIVEDATE]

    ,u.p.query('LOALIST') as [LOALIST]

    ,stuff((

    select ' - ' + l.l.value('.', 'varchar(100)') as [text()]

    from u.p.nodes('LOALIST/LOA') l(l)

    for xml path(''), type

    ).value('.','varchar(max)'), 1, 3, '') as [LOALIST_AS_TEXT]

    ,u.p.value('PERMITSTATUS[1]', 'varchar(10)') AS [PERMITSTATUS]

    ,u.p.value('APPOINTMENTSTATUS[1]', 'varchar(10)') AS [APPOINTMENTSTATUS]

    ) xp

    cross apply (

    select max(case x.val when 'Health' then 1 else 0 end) as [Health]

    ,max(case x.val when 'Life Health' then 1 else 0 end) as [Life Health]

    ,max(case x.val when 'life' then 1 else 0 end) as [Life]

    ,max(case x.val when 'Life, Health Variable Ann.' then 1 else 0 end) as [Life, Health Variable Ann.]

    ,max(case x.val when 'Earthquake' then 1 else 0 end) as [EarthQuake]

    from u.p.nodes('LOALIST/LOA') l(l)

    cross apply (

    select l.value('.','varchar(100)') as val

    ) x

    ) xLOA;

    edit: put loalist in the requested format.



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Oh, and one more that can be very handy during testing:

    declare @xml xml;

    select @xml = convert(xml, x.BulkColumn)

    from openrowset(bulk N'c:\test.xml', single_blob) as x;

    This reads the contents of the file c:\test.xml into variable @xml in your script. Saves a lot of copy-paste-edit time when testing/debugging your xml queries.

    By default, the file can only be read by SQL server from a file local on the server, provided the user SQL server service is running under is allowed to get to it. However by setting up security properly you can make it so that SQL server uses your integrated security account to read it from any location, including a network share on your machine, using the executing user's security context. i.e. every user can only make sql server read files from locations they are allowed to read themselves.

    For more information on setting up the required security, search for "Kerberos Constrained Delegation". For example:

    http://msdn.microsoft.com/en-us/library/aa528759.aspx

    http://msdn.microsoft.com/en-us/library/ee191523.aspx



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply