スタンドアロンマシンでSparkを使用して簡単なHello Worldプログラムを実行できます。しかし、Spark Contextを使って単語カウントプログラムを実行し、pysparkを使用してそれを実行すると、次のエラーが発生します。 ERROR SparkContext:SparkContextの初期化中にエラーが発生しました。 java.io.FileNotFoundException:ファイル・ファイルの追加:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.pyが存在しません。 私はMac OS Xを使用しています。私はbrewを使ってsparkをbrew install apache-sparkでインストールしました。どんなアイデアが今何がうまくいかないの?スパークのデフォルトのlog4jのプロファイルを使用してApache Spark- SparkContextの初期化中にエラーが発生しました。 java.io.FileNotFoundException
:
org/apache/spark/log4j-defaults.properties
16/07/19 23:18:45 INFO SparkContext: Running Spark version 1.6.2
16/07/19 23:18:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/19 23:18:45 INFO SecurityManager: Changing view acls to: tanyagupta
16/07/19 23:18:45 INFO SecurityManager: Changing modify acls to: tanyagupta
16/07/19 23:18:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(tanyagupta); users with modify permissions: Set(tanyagupta)
16/07/19 23:18:46 INFO Utils: Successfully started service 'sparkDriver' on port 59226.
16/07/19 23:18:46 INFO Slf4jLogger: Slf4jLogger started
16/07/19 23:18:46 INFO Remoting: Starting remoting
16/07/19 23:18:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:59227]
16/07/19 23:18:46 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 59227.
16/07/19 23:18:46 INFO SparkEnv: Registering MapOutputTracker
16/07/19 23:18:46 INFO SparkEnv: Registering BlockManagerMaster
16/07/19 23:18:46 INFO DiskBlockManager: Created local directory at /private/var/folders/2f/fltslxd54f5961xsc2wg1w680000gn/T/blockmgr-812de6f9-3e3d-4885-a7de-fc9c2e181c64
16/07/19 23:18:46 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/07/19 23:18:46 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/19 23:18:46 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/19 23:18:46 INFO SparkUI: Started SparkUI at http://192.168.0.5:4040
16/07/19 23:18:46 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: Added file file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist.
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
16/07/19 23:18:47 INFO SparkUI: Stopped Spark web UI at http://192.168.0.5:4040
16/07/19 23:18:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/07/19 23:18:47 INFO MemoryStore: MemoryStore cleared
16/07/19 23:18:47 INFO BlockManager: BlockManager stopped
16/07/19 23:18:47 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/19 23:18:47 WARN MetricsSystem: Stopping a MetricsSystem that is not running
16/07/19 23:18:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/07/19 23:18:47 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "/Users/tanyagupta/Documents/Internship/Zyudly Labs/Tanya-Programs/word_count.py", line 7, in <module>
sc=SparkContext(appName="WordCount_Tanya")
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 115, in __init__
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 172, in _do_init
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/pyspark.zip/pyspark/context.py", line 235, in _initialize_context
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
File "/usr/local/Cellar/apache-spark/1.6.2/libexec/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.io.FileNotFoundException: Added file file:/Users/tanyagupta/Documents/Internship/Zyudly%20Labs/Tanya-Programs/word_count.py does not exist.
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1364)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:1340)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:491)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:491)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
16/07/19 23:18:47 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/07/19 23:18:47 INFO ShutdownHookManager: Shutdown hook called
16/07/19 23:18:47 INFO ShutdownHookManager: Deleting directory /private/var/folders/2f/fltslxd54f5961xsc2wg1w680000gn/T/spark-f69e5dfc-6561-4677-9ec0-03594eabc991
正確に同じエラーを持っている[質問](http://stackoverflow.com/questions/32402094/spark-submit-fails-to-import-sparkcontext)を参照してください。** Tanya_programs **を追加する必要があります。ディレクトリを** PYTHONPATH **変数に追加します。 – ashwinids