私は、ubuntu 12.04にPySparkとIpythonのノートブックをインストールしました。私は「ipython --profile = pyspark」を実行すると、それは以下IPythonを使用しているときのPySparkの例外
[email protected]_user-VirtualBox:~$ ipython --profile=pyspark
Python 2.7.3 (default, Jun 22 2015, 19:33:41)
Type "copyright", "credits" or "license" for more information.
IPython 0.12.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
IPython profile: pyspark
Error: Must specify a primary resource (JAR or Python or R file)
Run with --help for usage help or --verbose for debug output
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
173 else:
174 filename = fname
--> 175 __builtin__.execfile(filename, *where)
/home/ubuntu_user/.config/ipython/profile_pyspark/startup/00-pyspark-setup.py in <module>()
6 sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
7
----> 8 execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))
9
/home/ubuntu_user/spark/python/pyspark/shell.py in <module>()
41 SparkContext.setSystemProperty("spark.executor.uri", os.environ["SPARK_EXECUTOR_URI"])
42
---> 43 sc = SparkContext(pyFiles=add_files)
44 atexit.register(lambda: sc.stop())
45
/home/ubuntu_user/spark/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
108 """
109 self._callsite = first_spark_call() or CallSite(None, None, None)
--> 110 SparkContext._ensure_initialized(self, gateway=gateway)
111 try:
112 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
/home/ubuntu_user/spark/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
232 with SparkContext._lock:
233 if not SparkContext._gateway:
--> 234 SparkContext._gateway = gateway or launch_gateway()
235 SparkContext._jvm = SparkContext._gateway.jvm
236
/home/ubuntu_user/spark/python/pyspark/java_gateway.pyc in launch_gateway()
92 callback_socket.close()
93 if gateway_port is None:
---> 94 raise Exception("Java gateway process exited before sending the driver its port number")
95
96 # In Windows, ensure the Java child processes do not linger after Python has exited.
Exception: Java gateway process exited before sending the driver its port number
次の例外をスローしてインストールした後
は、設定および構成ファイルです。以下は
[email protected]_user-VirtualBox:~$ ls /home/ubuntu_user/spark
bin ec2 licenses README.md
CHANGES.txt examples NOTICE RELEASE
conf lib python sbin
data LICENSE R spark-1.5.2-bin-hadoop2.6.tgz
IPythonはIPythonとスパーク(PySpark)構成の.bashrcまたは.bash_profileの中に以下の環境変数の設定
[email protected]_user-VirtualBox:~$ vi .config/ipython/profile_pyspark/ipython_notebook_config.py
# Configuration file for ipython-notebook.
c = get_config()
# IPython PySpark
c.NotebookApp.ip = 'localhost'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 7770
[email protected]_user-VirtualBox:~$ vi .config/ipython/profile_pyspark/startup/00-pyspark-setup.py
import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))
[email protected]_user-VirtualBox:~$ ls .config/ipython/profile_pyspark/
db ipython_config.py log security
history.sqlite ipython_notebook_config.py pid startup
を設定されています
[email protected]_user-VirtualBox:~$ vi .bashrc
export SPARK_HOME="/home/ubuntu_user/spark"
export PYSPARK_SUBMIT_ARGS="--master local[2]"
を
私はapache sparkとIPythonの新機能です。この問題を解決するには?