0
私はPySparkの機械学習チュートリアルを試しています。データセットテーブルを印刷しようとするときに問題が発生しました
this tutorial hereに続いています。
「相関とデータの準備」のセクションに移動したときに問題が発生しました。
は、ここでこのコードを実行しようとしていた:
from pyspark.sql.types import DoubleType
from pyspark.sql.functions import UserDefinedFunction
binary_map = {'Yes':1.0, 'No':0.0, 'True':1.0, 'False':0.0}
toNum = UserDefinedFunction(lambda k: binary_map[k], DoubleType())
CV_data = CV_data.drop('State').drop('Area code') \
.drop('Total day charge').drop('Total eve charge') \
.drop('Total night charge').drop('Total intl charge') \
.withColumn('Churn', toNum(CV_data['Churn'])) \
.withColumn('International plan', toNum(CV_data['International plan'])) \
.withColumn('Voice mail plan', toNum(CV_data['Voice mail plan'])).cache()
final_test_data = final_test_data.drop('State').drop('Area code') \
.drop('Total day charge').drop('Total eve charge') \
.drop('Total night charge').drop('Total intl charge') \
.withColumn('Churn', toNum(final_test_data['Churn'])) \
.withColumn('International plan', toNum(final_test_data['International plan'])) \
.withColumn('Voice mail plan', toNum(final_test_data['Voice mail plan'])).cache()
これは、端末(部分)に印刷エラーメッセージです。
17/06/20 17:58:53 WARN BlockManager: Putting block rdd_38_0 failed due to an exception
17/06/20 17:58:53 WARN BlockManager: Block rdd_38_0 could not be removed as it was not found on disk or in memory
17/06/20 17:58:53 WARN BlockManager: Putting block rdd_53_0 failed due to an exception
17/06/20 17:58:53 WARN BlockManager: Block rdd_53_0 could not be removed as it was not found on disk or in memory
17/06/20 17:58:53 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 16)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/main/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main
process()
File "/home/main/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/home/main/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 106, in <lambda>
func = lambda _, it: map(mapper, it)
File "<string>", line 1, in <lambda>
File "/home/main/spark-2.1.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 70, in <lambda>
return lambda *a: f(*a)
File "<stdin>", line 1, in <lambda>
KeyError: False
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
....
残りのエラーメッセージは、this document hereから表示できます。
誰でも問題が分かっていますか?
ありがとうございます。 【解決済み】