2017-06-23 14 views
0

2つの行の違いを見つけることができるクエリがありますが、重複行を差分として表示することもできます。私はそのテーブルを知っているactual_ordersは重複していると私のテーブルexpected_rowsは重複していません。違いとして重複を表示するようにクエリを変更するにはどうすればよいですか?実際のデータだけでなく、重複を含む2つのテーブルの相違点を確認

これが私のクエリです:

 select 
    expected_orders.mk_file_id,actual_orders.mk_file_id, 
    expected_orders.ind_id, actual_orders.ind_id, 
    expected_orders.mk_cust_id,actual_orders.mk_cust_id, 
    expected_orders.order_sk,actual_orders.order_sk, 
    expected_orders.progen_order_id,actual_orders.progen_order_id, 
    expected_orders.order_chanel_id,actual_orders.order_chanel_id, 
    expected_orders.order_date_str,actual_orders.order_date_str, 
    expected_orders.order_total_usd,actual_orders.order_total_usd, 
    expected_orders.order_ship_usd,actual_orders.order_ship_usd, 
    expected_orders.order_discount_usd,actual_orders.order_discount_usd, 
    expected_orders.order_tax_usd,actual_orders.order_tax_usd, 
    expected_orders.empty_source_code,actual_orders.empty_source_code, 
    expected_orders.method_of_payment_code,actual_orders.method_of_payment_code, 
    expected_orders.feed_id,actual_orders.feed_id, 
    expected_orders.creation_date_str,actual_orders.creation_date_str, 
    expected_orders.update_ts_str,actual_orders.update_ts_str, 
    expected_orders.empty_match_type,actual_orders.empty_match_type, 
    expected_orders.mp_id,actual_orders.mp_id 
    from default.expected_orders 
    FULL OUTER JOIN default.actual_orders 
    ON (
     COALESCE(expected_orders.mk_file_id,-1)=COALESCE(actual_orders.mk_file_id,-1) AND 
     COALESCE(expected_orders.ind_id,-1)=COALESCE(actual_orders.ind_id,-1)AND 
     COALESCE(expected_orders.mk_cust_id,'-1')=COALESCE(actual_orders.mk_cust_id,'-1') AND 
     COALESCE(expected_orders.order_sk,-1)=COALESCE(actual_orders.order_sk,-1) 

    )where (
    COALESCE(expected_orders.mk_file_id,-1)<>COALESCE(actual_orders.mk_file_id,-1) OR 
    COALESCE(expected_orders.ind_id,-1)<>COALESCE(actual_orders.ind_id,-1) OR 
    COALESCE(expected_orders.mk_cust_id,'-1')<>COALESCE(actual_orders.mk_cust_id,'-1') OR 
    COALESCE(expected_orders.order_sk,-1)<>COALESCE(actual_orders.order_sk,-1) OR 
    COALESCE(expected_orders.progen_order_id,'-1')<>COALESCE(actual_orders.progen_order_id,'-1') OR 
    COALESCE(expected_orders.order_chanel_id,-1)<>COALESCE(actual_orders.order_chanel_id,-1) OR 
    COALESCE(expected_orders.order_date_str,'-1')<>COALESCE(actual_orders.order_date_str,'-1') OR 
    COALESCE(expected_orders.order_total_usd,0.0)<>COALESCE(actual_orders.order_total_usd,0.0) OR 
    COALESCE(expected_orders.order_ship_usd,0.0)<>COALESCE(actual_orders.order_ship_usd,0.0) OR 
    COALESCE(expected_orders.order_discount_usd,0.0)<>COALESCE(actual_orders.order_discount_usd,0.0) OR 
    COALESCE(expected_orders.order_tax_usd,0.0)<>COALESCE(actual_orders.order_tax_usd,0.0) OR 
    COALESCE(expected_orders.empty_source_code,'-1')<>COALESCE(actual_orders.empty_source_code,'-1') OR 
    COALESCE(expected_orders.method_of_payment_code,'-1')<>COALESCE(actual_orders.method_of_payment_code,'-1') OR 
    COALESCE(expected_orders.feed_id,-1)<>COALESCE(actual_orders.feed_id,-1) OR 
    COALESCE(expected_orders.creation_date_str,'-1')<>COALESCE(actual_orders.creation_date_str,'-1') OR 
    COALESCE(expected_orders.update_ts_str,'-1')<>COALESCE(actual_orders.update_ts_str,'-1') OR 
    COALESCE(expected_orders.empty_match_type,'-1')<>COALESCE(actual_orders.empty_match_type,'-1') OR 
    COALESCE(expected_orders.mp_id,-1)<>COALESCE(actual_orders.mp_id,-1)) 

私はハイブを使用していますが、私はまた、SQLや進捗状況などの他のタグを含めるつもりです。すべてのヘルプは本当に高レベルな概要

select  total_rows 
      ,expected_rows 
      ,actual_rows 
      ,record_variations 
      ,count (*)    as number_of_keys 

from  (select  count (*)        as total_rows 
         ,count (case when tab = 'E' then 1 end) as expected_rows 
         ,count (case when tab = 'A' then 1 end) as actual_rows 
         ,count (distinct rec)     as record_variations 

      from  (   select 'E' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from expected_orders 
         union all select 'A' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from actual_orders 
         ) t 

      group by mk_file_id 
         ,ind_id  
         ,mk_cust_id 
         ,order_sk 
      ) t 

group by total_rows 
      ,expected_rows 
      ,actual_rows 
      ,record_variations 
; 

+0

をドリルダウンし、 '<=>' –

+0

は*(カウントを追加使用されるだろう)列を使用して、重複数を計算して比較することもできます。 – leftjoin

答えて

0

スタートを高く評価して、NULL等しいNULLを含め平等について

select  mk_file_id 
      ,ind_id  
      ,mk_cust_id 
      ,order_sk 

      ,count (*)        as total_rows 
      ,count (case when tab = 'E' then 1 end) as expected_rows 
      ,count (case when tab = 'A' then 1 end) as actual_rows 
      ,count (distinct rec)     as record_variations 


from  (   select 'E' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from expected_orders 
      union all select 'A' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from actual_orders 
      ) t 

group by mk_file_id 
      ,ind_id  
      ,mk_cust_id 
      ,order_sk 

-- having ... 
; 
+0

こんにちは、あなたはこの提案を見ましたか? –

関連する問題