2017-11-15 3 views
0

Xpathでいくつかの助けが必要です。私は毎日(約500MB)のシュレッダーに必要なXML文書を持っています。SQL ServerのXpath

のXmlの例を次に示します。

https://docs.google.com/document/d/11Ov6KXo83pu3xEXGovrFK-Hm2cMjZW1gqwTnRTraj58/edit?usp=sharing

実際のファイルは、取引の停止し、数十万人が含まれています。

私はここに文書を細断するために取り組んでストアドプロシージャを持っている: https://docs.google.com/document/d/1tegGjaK0x7L3uajeUlD1ex0F-yoeSueujDTarkW_Bak/edit?usp=sharing

問題は、私はSQL Serverエージェントからの私たちのVM上でそれを実行したとき、それは5時間を要するということです。

理想的には、これは1時間以内に実行されます(これは個人用のマシンで実行されます)。

私は多分、私はこのような何かを行うと、1文ではなく、使用の一時テーブルと参加に全力を尽くすことができると考えた:

SELECT 
se.value('@Filename', 'varchar(50)') As [Filename], 
se.value('@FareIdentifier', 'varchar(50)') As FareIdentifier, 
se.value('@DeviceType', 'varchar(50)') As DeviceType, 
ro.value('@ServiceIdentifier', 'varchar(50)') As ServiceIdentifier, 
ro.value('@ServiceNumber', 'varchar(50)') As ServiceNumber, 
ro.value('@CarrierNumber', 'varchar(50)') AS CarrierNumber, 
jo.value('@VehicleRegistration', 'varchar(50)') As VehicleRegistration, 
jo.value('@VehicleNumber', 'varchar(50)') As VehicleNumber, 
jo.value('@VehicleDepotNumber', 'varchar(50)') As VehicleDepotNumber, 
jo.value('@ServiceDepotNumber', 'varchar(50)') As ServiceDepotNumber, 
jo.value('@JourneyDirection', 'varchar(50)') As JourneyDirection, 
jo.value('@JourneyEnd', 'varchar(50)') As JourneyEnd, 
jo.value('@JourneyStart', 'varchar(50)') As JourneyStart, 
jo.value('@RunningBoardNumber', 'varchar(50)') As RunningBoardNumber, 
jo.value('@RunningBoardDepotNumber', 'varchar(50)') As RunningBoardDepotNumber, 
jo.value('@DutyNumber', 'varchar(50)') As DutyNumber, 
jo.value('@DutyDepotNumber', 'varchar(50)') As DutyDepotNumber, 
jo.value('@DriverName', 'varchar(50)') As DriverName, 
jo.value('@DriverNumber', 'varchar(50)') As DriverNumber, 
jo.value('@FinalTicketSerial', 'varchar(50)') As FinalTicketSerial, 
jo.value('@StartTicketSerial', 'varchar(50)') As StartTicketSerial, 
jo.value('@ContractType', 'varchar(50)') As ContractType, 
jo.value('@DriverDepotNumber', 'varchar(50)') As DriverDepotNumber, 
jo.value('@EtmDepotNumber', 'varchar(50)') As EtmDepotNumber, 
jo.value('@Company', 'varchar(50)') As Company, 
jo.value('@JourneyType', 'varchar(50)') As JourneyType, 
jo.value('@JourneyNumber', 'varchar(50)') As JourneyNumber, 
st.value('@TCA', 'varchar(50)') As TCA, 
st.value('@Latitude', 'varchar(50)') As Latitude, 
st.value('@Longitude', 'varchar(50)') As Longitude, 
st.value('@NaptanCode', 'varchar(50)') As NaptanCode, 
st.value('@AtcoCode', 'varchar(50)') As ATCOCode, 
st.value('@StopName', 'varchar(50)') As StopName, 
st.value('@BusStopNumber', 'varchar(50)') As BusStopNumber, 
st.value('@StopNumber', 'varchar(50)') As StopNumber, 
st.value('@BoardingFareStageName', 'varchar(50)') As BoardingFareStageName, 
st.value('@BoardingFareStageNumber', 'varchar(50)') As BoardingFareStageNumber, 
st.value('@BoardingFareStageOwner', 'varchar(50)') As BoardingFareStageOwner, 
st.value('@RealDeparture', 'varchar(50)') As RealDeparture, 
st.value('@RealArrival', 'varchar(50)') As RealArrival, 
st.value('@OwnerNumber', 'varchar(50)') As OwnerNumber, 
cs.value('@SoldTicketClassName', 'varchar(50)') As SoldTicketClassName, 
cs.value('@SalesPrice', 'varchar(50)') As SalesPrice, 
cs.value('@PaymentMethod', 'varchar(50)') As PaymentMethod, 
cs.value('@ISOCurrency', 'varchar(50)') As ISOCurrency, 
cs.value('@SoldTicketClassNo', 'varchar(50)') As SoldTicketClassNo, 
cs.value('@TicketProductName', 'varchar(50)') As TicketProductName, 
cs.value('@TicketProductNumber', 'varchar(50)') As TicketProductNumber, 
cs.value('@NumTransactions', 'varchar(50)') As NumTransactions, 
cs.value('@NumPgersOnTicket', 'varchar(50)') As NumPgersOnTicket, 
cs.value('@IssueDateTime', 'varchar(50)') As IssueDateTime, 
cs.value('@TicketNumber', 'varchar(50)') As TicketNumber, 
cs.value('@TicketGUID', 'varchar(50)') As TicketGUID, 
cs.value('@DestinationFareStageName', 'varchar(50)') As DestinationFareStageName, 
cs.value('@DestinationFareStageNumber', 'varchar(50)') As DestinationFareStageNumber, 
cs.value('@DestinationFareStageOwner', 'varchar(50)') As DestinationFareStageOwner, 
cs.value('@EventDateTime', 'varchar(50)') As EventDateTime, 
cs.value('@TransactionType', 'varchar(50)') As TransactionType, 
cs.value('@CardId', 'varchar(50)') As CardId, 
cs.value('@SchemeIdentifier', 'varchar(50)') As SchemeIdentifier 

FROM 
      XMLUpload CROSS APPLY 
      XmlData.nodes('Header/Session')  AS [Session](se)   CROSS APPLY 
      se.nodes('Routes/Route')    AS [Route](ro)   CROSS APPLY 
     ro.nodes('Journey')    As [Journey](jo)  CROSS APPLY 
     jo.nodes('DrivenStops/Stop')  As [Stop](st)   CROSS APPLY 
     st.nodes('Events/*')  As [CashSale](cs) 

それは要素CashSaleとCardUsageの重複行が作成されますので、これは動作しません。 (1行では、CashSaleフィールドはnullであり、1つのCardUsageフィールドではnullです)、同じ行に含める必要があります。

理想的な結果は、それぞれのCashSaleに1つのレコードが存在するときに、同じ行にCardUsageフィールドがあることです。 CardUsageがない場合、これらのフィールドはヌルです。

誰かがいくつかの最適化を提案することができますか、いくつかのオンラインリソースに向けて私を指摘してください。私はオンラインで役立つものを探すのに苦労しています。

データ型を変更するだけで、必要なサイズになるようにしましたが、パフォーマンスは向上しますが、必要なだけではありません。

私は完全に(powershellなどを使用して)別のソリューションには公開されていますが、このxmlで他のソリューションを使用することはできませんでした。

答えて

0

このようなXMLはクエリするのが難しいです。あなたは暗黙の順序に頼らなければなりません。 SQL ServerのXQueryは、兄弟姉妹や次の要素のような魅力的なナビゲーションを完全に認識していません。しかし、あなたはトリックを行うことができます:CTE内でこの行を細断することができ、簡単なNodeIndex=NodeIndex+1を使って次の要素を見つけることができます。これが「CardUsage」と呼ばれている場合はそれを受け取ります。それ以外の場合は受け取りません。

さらに、例を最小限に抑えてください。

DECLARE @dummyTable TABLE(ID INT IDENTITY, XmlData XML); 
INSERT INTO @dummyTable VALUES 
(N'<Header SchemaVersion="2.1" CreationDateTime="20171114044600" ExportCounter="1" DepositDate="20171114" Producer="http://portal.provider.org.uk/v2"> 
     <Session SessionId="a468dcd3fa963a49b274e64a5c714244" GTCCash="0" GTCRev="0" GTCTrans="0" DepositDateTime="20171112210811" DeviceType="Ticketer ETM" DeviceSerialNumber="0441105106901809" StartDateTime="20171112163000" EndDateTime="20171112210814" FareIdentifier="Ticketer"> 
     <Routes> 
      <Route CarrierNumber="1" ServiceNumber="13" ServiceIdentifier="13"> 
      <Journey JourneyNumber="639" JourneyType="SRV" Company="1" EtmDepotNumber="188" DriverDepotNumber="188" ContractType="UNK" StartTicketSerial="781" FinalTicketSerial="802" DriverNumber="214373" DutyNumber="214373" RunningBoard="420" JourneyStart="20171112173414" JourneyEnd="20171112190532" JourneyDirection="Outbound" VehicleNumber="67798" VehicleRegistration="67798" DriverName="GARY TAYLOR" VehicleDepotNumber="188" DutyDepotNumber="0" RunningBoardDepotNumber="188" ServiceDepotNumber="0" RunningBoardNumber="3325"> 
       <DrivenStops> 
       <Stop OwnerNumber="1" RealArrival="20171112190057" RealDeparture="20171112190057" BoardingFareStageOwner="0" BoardingFareStageNumber="701" BoardingFareStageName="Town Square (58)" StopNumber="31" BusStopNumber="1" StopName="Town Square" AtcoCode="639001322" NaptanCode="" Latitude="55.165746" Longitude="-2.141343" TCA=""> 
        <Events> 
        <CashSale TicketGUID="67177058-d0f8-422e-bb19-9245acea25fa" TicketNumber="802" BoardingFareStageOwner="0" BoardingFareStageNumber="701" BoardingFareStageName="Town Square (58)" DestinationFareStageOwner="0" DestinationFareStageNumber="683" DestinationFareStageName="Golden Lion (60)" IssueDateTime="20171112190057" NumPgersOnTicket="1.00" NumTransactions="1" TicketProductNumber="e8d237e2c67aba4bbc922d8daa46ce75" TicketProductName="SC Over 60" SoldTicketClassNo="18285" ISOCurrency="GBP" PaymentMethod="8" SalesPrice="0" ValuePrice="150" SoldTicketClassName="SC Over 60" /> 
        <CardUsage SchemeIdentifier="CCS" CardId="63356000903500039" TransactionType="USAGE" EventDateTime="20171112190057" /> 
        <CashSale TicketGUID="f7aa4ed4-a385-407d-b116-1c7eb69563be" TicketNumber="811" BoardingFareStageOwner="0" BoardingFareStageNumber="1421" BoardingFareStageName="Town Square (58)" IssueDateTime="20171112193332" NumPgersOnTicket="1.00" NumTransactions="1" TicketProductNumber="e49ac3fa349bfc40b2173163538c121e" TicketProductName="QR Accept Ad Wk" SoldTicketClassNo="19137" ISOCurrency="GBP" PaymentMethod="15" SalesPrice="0" SoldTicketClassName="QR Accept Ad Wk" /> 
        <CardUsage SchemeIdentifier="Barcode" CardId="d66e1bb355df4668b95be2e4764d30a6" TransactionType="USAGE" EventDateTime="20171112193332" /> 
        <CashSale TicketGUID="c6c2a867-676a-4d73-98d6-3a2571892b3d" TicketNumber="812" BoardingFareStageOwner="0" BoardingFareStageNumber="1421" BoardingFareStageName="Town Square (58)" IssueDateTime="20171112193336" NumPgersOnTicket="1.00" NumTransactions="1" TicketProductNumber="ed5f87065daf384bab7b0bd9f4b02c80" TicketProductName="Use Wk/M/3M/Yr" SoldTicketClassNo="18724" ISOCurrency="GBP" PaymentMethod="1" SalesPrice="0" SoldTicketClassName="Use Wk/M/3M/Yr" /> 
        <CashSale TicketGUID="2c7038ea-0800-4e4e-9d20-35fdb129e317" TicketNumber="828" BoardingFareStageOwner="0" BoardingFareStageNumber="272" BoardingFareStageName="Town Square (58)" DestinationFareStageOwner="0" DestinationFareStageNumber="383" DestinationFareStageName="Long Street (33)" IssueDateTime="20171112204134" NumPgersOnTicket="1.00" NumTransactions="1" TicketProductNumber="38e47af2e0a23c4a85eebd96d419feef" TicketProductName="Adult Single" SoldTicketClassNo="18286" ISOCurrency="GBP" PaymentMethod="1" SalesPrice="250" SoldTicketClassName="Adult Single" /> 
        </Events> 
       </Stop> 
       </DrivenStops> 
      </Journey> 
      </Route> 
     </Routes> 
     </Session> 
    </Header>'); 

- このあなたの削減クエリです:

SELECT 
st.value('@OwnerNumber', 'varchar(50)') As OwnerNumber, 
cs.value('@SoldTicketClassName', 'varchar(50)') As SoldTicketClassName, 
--please try to reduce your examples! 
cs.value('@TicketGUID', 'varchar(50)') As TicketGUID, 
cs.value('@CardId', 'varchar(50)') As CardId 

FROM 
     @dummyTable AS XmlUpload CROSS APPLY 
     XmlData.nodes('Header/Session') AS [Session](se)  CROSS APPLY 
     se.nodes('Routes/Route')   AS [Route](ro)   CROSS APPLY 
     ro.nodes('Journey')    As [Journey](jo)  CROSS APPLY 
     jo.nodes('DrivenStops/Stop')  As [Stop](st)   CROSS APPLY 
     st.nodes('Events/*')  As [CashSale](cs); 

/* 
+-------------+---------------------+--------------------------------------+----------------------------------+ 
| OwnerNumber | SoldTicketClassName | TicketGUID       | CardId       | 
+-------------+---------------------+--------------------------------------+----------------------------------+ 
| 1   | SC Over 60   | 67177058-d0f8-422e-bb19-9245acea25fa | NULL        | 
+-------------+---------------------+--------------------------------------+----------------------------------+ 
| 1   | NULL    | NULL         | 63356000903500039    | 
+-------------+---------------------+--------------------------------------+----------------------------------+ 
| 1   | QR Accept Ad Wk  | f7aa4ed4-a385-407d-b116-1c7eb69563be | NULL        | 
+-------------+---------------------+--------------------------------------+----------------------------------+ 
| 1   | NULL    | NULL         | d66e1bb355df4668b95be2e4764d30a6 | 
+-------------+---------------------+--------------------------------------+----------------------------------+ 
| 1   | Use Wk/M/3M/Yr  | c6c2a867-676a-4d73-98d6-3a2571892b3d | NULL        | 
+-------------+---------------------+--------------------------------------+----------------------------------+ 
| 1   | Adult Single  | 2c7038ea-0800-4e4e-9d20-35fdb129e317 | NULL        | 
+-------------+---------------------+--------------------------------------+----------------------------------+ 
*/ 

--Iあなたは@TicketGUID以外@CardIdをしたいことを想定 - 1があるかどうか。

WITH InnerNodesCTE AS 
(
    SELECT 
    st.value('@OwnerNumber', 'varchar(50)') As OwnerNumber, 
    --a running counter 
    ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS NodeIndex, 
    --the node's name 
    cs.value('local-name(.)','nvarchar(100)') AS CsNodeName, 
    --the node in one piece 
    cs.query('.') AS InnerNode 

    FROM 
      @dummyTable AS XmlUpload CROSS APPLY 
      XmlData.nodes('Header/Session') AS [Session](se)  CROSS APPLY 
      se.nodes('Routes/Route')   AS [Route](ro)   CROSS APPLY 
      ro.nodes('Journey')    As [Journey](jo)  CROSS APPLY 
      jo.nodes('DrivenStops/Stop')  As [Stop](st)   CROSS APPLY 
      st.nodes('Events/*')  As [CashSale](cs) 
) 
SELECT OwnerNumber, 
     InnerNode.value('CashSale[1]/@TicketGUID', 'varchar(50)') As TicketGUID, 
     --here you can list all your CashSale attributes 
     CardUsageNode.value('CardUsage[1]/@CardId', 'varchar(50)') As CardId 
     --here you can list all your CardUsage attributes 
FROM InnerNodesCTE AS cs 
OUTER APPLY(SELECT cu.InnerNode AS CardUsageNode FROM InnerNodesCTE AS cu WHERE cu.NodeIndex = cs.NodeIndex+1 AND cu.CsNodeName='CardUsage') AS A 
WHERE CsNodeName='CashSale' 

結果

/* 
+---+--------------------------------------+----------------------------------+ 
| 1 | 67177058-d0f8-422e-bb19-9245acea25fa | 63356000903500039    | 
+---+--------------------------------------+----------------------------------+ 
| 1 | f7aa4ed4-a385-407d-b116-1c7eb69563be | d66e1bb355df4668b95be2e4764d30a6 | 
+---+--------------------------------------+----------------------------------+ 
| 1 | c6c2a867-676a-4d73-98d6-3a2571892b3d | NULL        | 
+---+--------------------------------------+----------------------------------+ 
| 1 | 2c7038ea-0800-4e4e-9d20-35fdb129e317 | NULL        | 
+---+--------------------------------------+----------------------------------+ 
*/ 
:これを試してみてください