2016-10-14 15 views
3

rangeBetween方法についてrowsBetweenとrangeBetweenの違いは何ですか? PySparkドキュメントから

rangeBetween(start, end) 
Defines the frame boundaries, from start (inclusive) to end (inclusive). 

Both start and end are relative from the current row. For example, “0” means “current row”, while “-1” means one off before the current row, and “5” means the five off after the current row. 

Parameters: 
start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower). 
end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher). 
New in version 1.4. 

rowsBetween(start, end) 
Defines the frame boundaries, from start (inclusive) to end (inclusive). 

Both start and end are relative positions from the current row. For example, “0” means “current row”, while “-1” means the row before the current row, and “5” means the fifth row after the current row. 

Parameters: 
start – boundary start, inclusive. The frame is unbounded if this is -sys.maxsize (or lower). 
end – boundary end, inclusive. The frame is unbounded if this is sys.maxsize (or higher). 
New in version 1.4. 

は、例えば、 "1行目" とは異なる "1オフ" ですか?

答えて

4

それは簡単です:

  • ROWS BETWEENは、正確な値を気にしません。それは、フレームを計算するときに行の順序についてのみ気にします。
  • RANGE BETWEENは、フレームを計算するときに値を考慮します。

の2つのウィンドウ定義を使用した例を使ってみましょう:

+---+ 
| x| 
+---+ 
| 10| 
| 20| 
| 30| 
| 31| 
+---+ 

として

  • ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  • ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW

及びデータは、現在の行一のウィットであると仮定すると行後の最初のウィンドウの時間値31は、(現在のいずれかを、2つは前の)含まれる:

+---+----------------------------------------------------+ 
| x|ORDER BY x ROWS BETWEEN 2 PRECEDING AND CURRENT ROW| 
+---+----------------------------------------------------+ 
| 10|            false| 
| 20|            true| 
| 30|            true| 
| 31|            true| 
+---+----------------------------------------------------+ 

及び第二次一方(現在の、およびすべてここで、x> = 31前の - 2)の場合:

+---+-----------------------------------------------------+ 
| x|ORDER BY x RANGE BETWEEN 2 PRECEDING AND CURRENT ROW| 
+---+-----------------------------------------------------+ 
| 10|            false| 
| 20|            false| 
| 30|             true| 
| 31|             true| 
+---+-----------------------------------------------------+ 
関連する問題