2017-06-13 8 views
1

私はこのフィルタ

1.0,2.0,0.0019,0.0,0.0,0.0,0.0,0.0,0.0,0.0に密接に見えるRDDを、持っています、0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,3.0,0.0,3.0E-4,0.0,0.0,0.0,0.0.0.0,0.0.0.0.0.0.0、 0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0、 0.0,0.0 1.0,5.0、-0.0019、-2.0E-4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.4294 1.0 、6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,7.0,0.0,1.0E-4、 0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0、 0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0,8.0,0.0,3.0E-4,0.0,0.0,0.0,0.0.0.0.0.0.0.0.0.0.0,9040.8,0.0,0.0 、0.0,0.0,0.0,0.0 1.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0 1.0 、10.0、-0.0033,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,47.03,0.0,0.0,0.0,0.0 1.0,11.0,0.0、-3.0E- Iがゼロの数に等しい行をフィルタリングする必要4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,554.54,0.0,0.0,0.0,0.0,0.0,0.0,8140.58,0.0

フィルタメソッドのこの定義は、予想より多くの行をフィルタリングしています。

def filterZeroRowsWReadings(row: Array[String]) = { 
    var flag:Int = 0 
    for(value <- row) { 
     if(value.toDouble == 0.0) 
     flag = flag + 1 
    } 
    flag match { 
     case 15 => false 
     case _ => true 
    } 
} 

しかし、私は私のRDDのサブセットに3834にゼロの数と行を手動で計数しているが、上記のフィルタ方法は、3,960行を除去されています。さて、私はこれらの126行がどこに行くのか分かりません。何が起きているのかを知る方法はありますか?小さいRDDでは、期待どおりの結果が得られますが、大きなRDDでは予期しない結果になります。

ありがとうございました。

+1

おそらく、それは正確な問題ですか?文字列としての各値を "0.0"と比較し、それが何か変わるかどうかを調べることができます。 –

+0

Spot on、私はそれを行い、期待どおりに働いています。しかし、これは起こってはいけません。 0.00003!= 0.0 – atalpha

+0

あなたのマシンによって異なります。 0.00003は問題ではありませんが、3E-60は問題ではありません。矛盾のある行をcollect()を使用して印刷し、これらの行を手動のメソッドと比較したい場合があります。手動の方法が壊れている可能性があります。将来の参照のために以下の答えを「正しい」としてください。 –

答えて

1

多分、それは精密な問題ですか?文字列としての各値を "0.0"と比較し、それが何か変わるかどうかを調べることができます。