0
2つの行の違いを見つけることができるクエリがありますが、重複行を差分として表示することもできます。私はそのテーブルを知っているactual_ordersは重複していると私のテーブルexpected_rowsは重複していません。違いとして重複を表示するようにクエリを変更するにはどうすればよいですか?実際のデータだけでなく、重複を含む2つのテーブルの相違点を確認
これが私のクエリです:
select
expected_orders.mk_file_id,actual_orders.mk_file_id,
expected_orders.ind_id, actual_orders.ind_id,
expected_orders.mk_cust_id,actual_orders.mk_cust_id,
expected_orders.order_sk,actual_orders.order_sk,
expected_orders.progen_order_id,actual_orders.progen_order_id,
expected_orders.order_chanel_id,actual_orders.order_chanel_id,
expected_orders.order_date_str,actual_orders.order_date_str,
expected_orders.order_total_usd,actual_orders.order_total_usd,
expected_orders.order_ship_usd,actual_orders.order_ship_usd,
expected_orders.order_discount_usd,actual_orders.order_discount_usd,
expected_orders.order_tax_usd,actual_orders.order_tax_usd,
expected_orders.empty_source_code,actual_orders.empty_source_code,
expected_orders.method_of_payment_code,actual_orders.method_of_payment_code,
expected_orders.feed_id,actual_orders.feed_id,
expected_orders.creation_date_str,actual_orders.creation_date_str,
expected_orders.update_ts_str,actual_orders.update_ts_str,
expected_orders.empty_match_type,actual_orders.empty_match_type,
expected_orders.mp_id,actual_orders.mp_id
from default.expected_orders
FULL OUTER JOIN default.actual_orders
ON (
COALESCE(expected_orders.mk_file_id,-1)=COALESCE(actual_orders.mk_file_id,-1) AND
COALESCE(expected_orders.ind_id,-1)=COALESCE(actual_orders.ind_id,-1)AND
COALESCE(expected_orders.mk_cust_id,'-1')=COALESCE(actual_orders.mk_cust_id,'-1') AND
COALESCE(expected_orders.order_sk,-1)=COALESCE(actual_orders.order_sk,-1)
)where (
COALESCE(expected_orders.mk_file_id,-1)<>COALESCE(actual_orders.mk_file_id,-1) OR
COALESCE(expected_orders.ind_id,-1)<>COALESCE(actual_orders.ind_id,-1) OR
COALESCE(expected_orders.mk_cust_id,'-1')<>COALESCE(actual_orders.mk_cust_id,'-1') OR
COALESCE(expected_orders.order_sk,-1)<>COALESCE(actual_orders.order_sk,-1) OR
COALESCE(expected_orders.progen_order_id,'-1')<>COALESCE(actual_orders.progen_order_id,'-1') OR
COALESCE(expected_orders.order_chanel_id,-1)<>COALESCE(actual_orders.order_chanel_id,-1) OR
COALESCE(expected_orders.order_date_str,'-1')<>COALESCE(actual_orders.order_date_str,'-1') OR
COALESCE(expected_orders.order_total_usd,0.0)<>COALESCE(actual_orders.order_total_usd,0.0) OR
COALESCE(expected_orders.order_ship_usd,0.0)<>COALESCE(actual_orders.order_ship_usd,0.0) OR
COALESCE(expected_orders.order_discount_usd,0.0)<>COALESCE(actual_orders.order_discount_usd,0.0) OR
COALESCE(expected_orders.order_tax_usd,0.0)<>COALESCE(actual_orders.order_tax_usd,0.0) OR
COALESCE(expected_orders.empty_source_code,'-1')<>COALESCE(actual_orders.empty_source_code,'-1') OR
COALESCE(expected_orders.method_of_payment_code,'-1')<>COALESCE(actual_orders.method_of_payment_code,'-1') OR
COALESCE(expected_orders.feed_id,-1)<>COALESCE(actual_orders.feed_id,-1) OR
COALESCE(expected_orders.creation_date_str,'-1')<>COALESCE(actual_orders.creation_date_str,'-1') OR
COALESCE(expected_orders.update_ts_str,'-1')<>COALESCE(actual_orders.update_ts_str,'-1') OR
COALESCE(expected_orders.empty_match_type,'-1')<>COALESCE(actual_orders.empty_match_type,'-1') OR
COALESCE(expected_orders.mp_id,-1)<>COALESCE(actual_orders.mp_id,-1))
私はハイブを使用していますが、私はまた、SQLや進捗状況などの他のタグを含めるつもりです。すべてのヘルプは本当に高レベルな概要
select total_rows
,expected_rows
,actual_rows
,record_variations
,count (*) as number_of_keys
from (select count (*) as total_rows
,count (case when tab = 'E' then 1 end) as expected_rows
,count (case when tab = 'A' then 1 end) as actual_rows
,count (distinct rec) as record_variations
from ( select 'E' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from expected_orders
union all select 'A' as tab,struct(*) as rec,mk_file_id,ind_id,mk_cust_id,order_sk from actual_orders
) t
group by mk_file_id
,ind_id
,mk_cust_id
,order_sk
) t
group by total_rows
,expected_rows
,actual_rows
,record_variations
;
と
をドリルダウンし、 '<=>' –
は*(カウントを追加使用されるだろう)列を使用して、重複数を計算して比較することもできます。 – leftjoin