2017-07-07 2 views
3

100Mポイントデータセットのビッグクエリとヒットタイムアウトにはまったく新しいものがあります。私は0(停止)の周りの一貫した一連の値に達する点と、一貫して0(開始)を超える点を見つけることを試みています。ビッグクエリアナリティック関数はクエリパフォーマンスを向上させます

開始ファイルの時刻を決定するサブクエリが、それ自身のデータセットに保存されていましたが、それは役に立ちませんでした。 (秒は、複数の経由インクリメント「ファイル。」

問題を引き起こし部分が前のPTSと次のPTSの初期集合体である。私が保存しようとした
違いを生むだろう

WITH test AS 
(SELECT 'A' as ACM, CAST('2017-01-01' AS DATE) as file_date, CAST('10:10:10' AS TIME) as file_time , 0.0 as value, 0.1 as seconds 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 0, 0.2 #start 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 2000, 0.3 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 1000, 0.4 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 0, 0.5 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', -1000, 0.6 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', -2000, 0.7 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 0, 0.8 #stop 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 0, 0.9 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 0, 1.0 #start 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 1000, 1.1 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 1000, 1.2 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 2000, 1.3 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 0, 1.4 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 0, 1.5 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', -1000, 1.6 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', -2000, 1.7 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 0, 1.8 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 1000, 1.9 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 2000, 2.0 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 1000, 2.1 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 0, 2.2 #stop 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 20, 2.3 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 0, 2.4 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.1 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.2 #start 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 2000, 0.3 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 1000, 0.4 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.5 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', -1000, 0.6 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', -2000, 0.7 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.8 #stop 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.9 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 0, 1.0 #start 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 1000, 1.1 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 1000, 1.2 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 2000, 1.3 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 0, 1.4 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 0, 1.5 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', -1000, 1.6 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', -2000, 1.7 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 0, 1.8 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 1000, 1.9 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 2000, 2.0 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 1000, 2.1 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 0, 2.2 #stop 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 20, 2.3 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 0, 2.4) 
SELECT 
    acm, 
    file_date, 
    start_file_time, 
    file_times, 
    agg_sec as start_stop 
FROM (
    SELECT 
    acm, 
    file_date, 
    start_file_time, 
    file_times, 
    ARRAY_AGG(kind) OVER w AS agg_kind, 
    ARRAY_AGG(seconds) OVER w AS agg_sec 
    FROM (
    SELECT 
     acm, 
     file_date, 
     start_file_time,  
     ARRAY(SELECT DISTINCT x FROM UNNEST(file_times) as x) AS file_times, 
     seconds, 
     CASE 
     WHEN (ABS(prev_val) < 50 and ABS(next_val) >= 50 and next_avg >= 50 and prev_avg < 50) THEN 'start' 
     WHEN (ABS(next_val) < 50 and ABS(prev_val) >= 50 and prev_avg >= 50 and next_avg < 50) THEN 'stop' 
     END as kind, 
     prev_val, next_val, prev_avg, next_avg 
    FROM (
     SELECT 
     s.acm as acm, 
     s.file_date as file_date, 
     s.start_file_time as start_file_time, 
     seconds, 
     value, 
     ARRAY_AGG(s.file_time) OVER (PARTITION BY s.acm, s.file_date, s.start_file_time) as file_times, 
     AVG(ABS(value)) OVER prev as prev_avg, 
     NTH_VALUE(value, 2) OVER prev as prev_val, 
     AVG(ABS(value)) OVER next as next_avg, 
     NTH_VALUE(value, 2) OVER next as next_val 
     FROM test v 
     JOIN (
     SELECT 
      acm, 
      file_date, 
      file_time, 
      TIME_SUB(file_time, INTERVAL CAST(FLOOR(MIN(seconds)) AS INT64) SECOND) as start_file_time 
     FROM test 
     GROUP BY acm, file_date, file_time 
    ) s ON s.acm = v.acm AND s.file_date = v.file_date AND s.file_time = v.file_time 
     WINDOW prev AS (PARTITION BY s.acm, s.file_date, s.start_file_time ORDER BY seconds ROWS 2 PRECEDING), next AS (PARTITION BY s.acm, s.file_date, s.start_file_time ORDER BY seconds ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) 
    ) 
    WHERE value = 0) 
    WHERE kind IN ('start', 'stop') 
    WINDOW w AS (PARTITION BY acm, file_date, start_file_time ORDER BY seconds ROWS 1 PRECEDING)) 
WHERE ARRAY_LENGTH(agg_kind) = 2 AND agg_kind[ORDINAL(1)] = 'start' AND agg_kind[ORDINAL(2)] = 'stop' 
; 
+1

あなたの投稿のサンプルコードは**正しい結果**を生成しますが、実際のデータに適用するとタイムアウトしますか? –

+0

SOの重要な点 - 投稿された回答の左側にある投票の下にあるチェックマークを使って、「受け入れられた回答をマークする」ことができます。重要な理由については、http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work#5235を参照してください。答えに投票することも重要です。役に立った答えを投票してください。 ...誰かがあなたの質問に答えるときに何をすべきかを調べることができます - http://stackoverflow.com/help/someone-answers。これらの単純なルールに従えば、あなた自身の評判スコアを上げると同時に、私たちはあなたの質問に答えるために動機づけることができます:o)考慮してください! –

答えて

1

チェックバージョン以下の場合

#standardSQL 
WITH test AS 
(SELECT 'A' AS ACM, CAST('2017-01-01' AS DATE) AS file_date, CAST('10:10:10' AS TIME) AS file_time , 0.0 AS value, 0.1 AS seconds 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 0, 0.2 #start 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 2000, 0.3 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 1000, 0.4 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 0, 0.5 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', -1000, 0.6 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', -2000, 0.7 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 0, 0.8 #stop 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:10', 0, 0.9 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 0, 1.0 #start 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 1000, 1.1 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 1000, 1.2 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 2000, 1.3 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 0, 1.4 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 0, 1.5 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', -1000, 1.6 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', -2000, 1.7 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 0, 1.8 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:11', 1000, 1.9 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 2000, 2.0 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 1000, 2.1 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 0, 2.2 #stop 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 20, 2.3 
    UNION ALL SELECT 'A', '2017-01-01', '10:10:12', 0, 2.4 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.1 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.2 #start 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 2000, 0.3 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 1000, 0.4 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.5 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', -1000, 0.6 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', -2000, 0.7 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.8 #stop 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:10', 0, 0.9 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 0, 1.0 #start 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 1000, 1.1 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 1000, 1.2 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 2000, 1.3 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 0, 1.4 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 0, 1.5 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', -1000, 1.6 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', -2000, 1.7 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 0, 1.8 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:11', 1000, 1.9 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 2000, 2.0 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 1000, 2.1 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 0, 2.2 #stop 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 20, 2.3 
    UNION ALL SELECT 'B', '2017-01-01', '10:10:12', 0, 2.4 
), temp1 AS (
    SELECT acm, file_date, value, seconds, 
    TIME_SUB(file_time, INTERVAL CAST(FLOOR(seconds) AS INT64) SECOND) AS start_file_time 
    FROM test 
), temp2 AS (
    SELECT 
    acm, file_date, start_file_time, seconds, 
    AVG(ABS(value)) OVER prev AS prev_avg, 
    NTH_VALUE(value, 2) OVER prev AS prev_val, 
    AVG(ABS(value)) OVER next AS next_avg, 
    NTH_VALUE(value, 2) OVER next AS next_val 
    FROM temp1 WINDOW 
    prev AS (PARTITION BY acm, file_date, start_file_time ORDER BY seconds ROWS 2 PRECEDING), 
    next AS (PARTITION BY acm, file_date, start_file_time ORDER BY seconds ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) 
), temp3 AS (
    SELECT 
    acm, file_date, start_file_time, seconds, 
    CASE 
     WHEN (ABS(prev_val) < 50 AND ABS(next_val) >= 50 AND next_avg >= 50 AND prev_avg < 50) THEN 'start' 
     WHEN (ABS(next_val) < 50 AND ABS(prev_val) >= 50 AND prev_avg >= 50 AND next_avg < 50) THEN 'stop' 
    END AS kind 
    FROM temp2 
), temp4 AS (
    SELECT *, 
    COUNTIF(kind = 'start') OVER (PARTITION BY acm, file_date, start_file_time ORDER BY seconds) + 
    COUNTIF(kind = 'stop') OVER (PARTITION BY acm, file_date, start_file_time ORDER BY seconds ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp 
    FROM temp3 
) 
SELECT 
    acm, file_date, start_file_time, 
    MIN(seconds) AS start_seconds, 
    MAX(seconds) AS stop_seconds 
FROM temp4 
GROUP BY acm, file_date, start_file_time, grp 
HAVING MIN(kind) != MAX(kind) 
-- ORDER BY 1, 2, 3, 4 
1

可能な限り元のコードがうまくいけば、このクエリはあなたが捜している結果を与え、あなたのデータを処理することが可能です正常に設定:

SELECT 
    * EXCEPT(file_data), 
    ARRAY(SELECT STRUCT(seconds, kind) FROM UNNEST(file_data) WHERE kind IS NOT NULL) file_data 
FROM(
    SELECT 
    ACM, 
    file_date, 
    start_file_time, 
    ARRAY(SELECT DISTINCT file_time FROM UNNEST(file_data)) file_times, 
    ARRAY(SELECT STRUCT(seconds, IF(value = 0, (CASE WHEN ABS(NTH_VALUE(value, 2) OVER(prev)) < 50 AND ABS(NTH_VALUE(value, 2) OVER(next)) >= 50 AND AVG(ABS(value)) OVER(next) >= 50 and AVG(ABS(value)) OVER(prev) < 50 THEN 'start' 
                WHEN ABS(NTH_VALUE(value, 2) OVER(next)) < 50 AND ABS(NTH_VALUE(value, 2) OVER(prev)) >= 50 AND AVG(ABS(value)) OVER(prev) >= 50 and AVG(ABS(value)) OVER(next) < 50 THEN 'stop' END), NULL) as kind) 
      FROM UNNEST(file_data) WINDOW prev AS (ORDER BY seconds ROWS 2 PRECEDING), next as(ORDER BY seconds ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)) file_data 
    FROM(
    SELECT 
     ACM, 
     file_date, 
     TIME_SUB(file_time, INTERVAL CAST(FLOOR(seconds) AS INT64) SECOND) AS start_file_time, 
     ARRAY_AGG(STRUCT(file_time, value, seconds)) file_data 
    FROM test 
    GROUP BY ACM, file_date, start_file_time 
    ) 
) 

その結果は、あなたのtestデータに「開始」と「停止」と説明した、まさにです。

作るためにいくつかの注意事項:

  • 私は高価なJOIN操作を避けます。
  • 可能な限りARRAYとSTRUCTを使用することで、格納効率が向上するだけでなく、必要なデータのみの処理、つまり重複データを処理する必要がないためにクエリパフォーマンスが向上します。
  • ちょうど2 WINDOWそれぞれがパフォーマンスを向上させる対応するARRAY構造の内部で使用されます。これは可能です.STRUCTのARRAYにすべてを集約したので、データがすでにソートされているので、より複雑なウィンドウング句は必要ありません。
  • このクエリではデータの重複はありません。
  • 結果がわずかに異なる構造になっていることに注意してください。この新しいデータストアを使用することをお勧めします。

この機能が動作するかどうかお知らせください。

関連する問題