0

Pysparkからcassandraに接続してクエリを実行しようとしています。ここ は、私が行っているすべての手順は次のとおりです。Pysparkからcassandraに接続できません

まず私はスパークインストール:次に

wget http://www.apache.org/dyn/closer.lua/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz 

を:

cd spark-2.1.0-bin-hadoop2.7/ 

その後、私は、このコマンドを実行します。

./bin/pyspark 

私が得ましたこれは:

16:48 $ ./bin/pyspark 
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
Setting default log level to "WARN". 
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 
17/05/02 16:50:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/05/02 16:50:33 WARN Utils: Your hostname, rleitao-H81M-HD3 resolves to a loopback address: 127.0.1.1; using 192.168.1.26 instead (on interface eth0) 
17/05/02 16:50:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
17/05/02 16:50:36 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException 
Welcome to 
     ____    __ 
    /__/__ ___ _____/ /__ 
    _\ \/ _ \/ _ `/ __/ '_/ 
    /__/.__/\_,_/_/ /_/\_\ version 2.1.0 
     /_/ 

Using Python version 2.7.12 (default, Nov 19 2016 06:48:10) 
SparkSession available as 'spark'. 
>>> 

その後:次に

from pyspark.sql import SQLContext 
sql = SQLContext(sc) 

df = sql.read.format("org.apache.spark.sql.cassandra").\ 
option("spark.cassandra.connection.host", "ec2-IPV4-Adress.REGION.compute.amazonaws.com").\ 
option("spark.cassandra.auth.username", "user"). \ 
option("spark.cassandra.auth.password", "pass"). \ 
option(keyspace="mykeyspace", table="mytable").load() 

が、その後オプス、私はこの巨大なエラーを得た:

>>> df = sql.read.format("org.apache.spark.sql.cassandra").\ 
    ... option("spark.cassandra.connection.host", "ec2-IPV4-adress.REGION.compute.amazonaws.com").\ 
    ... option("spark.cassandra.auth.username", "user"). \ 
    ... option("spark.cassandra.auth.password", "pass"). \ 
    ... option(keyspace="mykeyspace", table="mytable").load() 
    17/05/02 16:47:43 ERROR Schema: Failed initialising database. 
    Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------ 
    java.sql.SQLException: Failed to start database 'metastore_db' with class loader [email protected]39daf, see the next exception for details. 
     at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) 
     at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) 
     at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source) 
     at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source) 
     at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source) 
     at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source) 
     at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at org.apache.derby.jdbc.InternalDriver.getNewEmbedConnection(Unknown Source) 
     at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source) 
     at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source) 
     at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source) 
     at java.sql.DriverManager.getConnection(DriverManager.java:664) 
     at java.sql.DriverManager.getConnection(DriverManager.java:208) 
ct.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:498) 
     at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) 
     at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) 
     at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) 
     at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) 
     at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365) 
     at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394) 
     at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291) 
     at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258) 
     at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) 
     at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) 
     at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57) 
     at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66) 
     at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) 
     at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199) 
     at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
     at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) 
     at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) 
     at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) 
     at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166) 
     at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:192) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
     ... 108 more 
    Traceback (most recent call last): 
     File "<stdin>", line 1, in <module> 
     File "/home/souadmabrouk/Bureau/Souad/project/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/context.py", line 464, in read 
     return DataFrameReader(self) 
     File "/home/souadmabrouk/Bureau/Souad/project/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/readwriter.py", line 70, in __init__ 
     self._jreader = spark._ssql_ctx.read() 
     File "/home/souadmabrouk/Bureau/Souad/project/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ 
     File "/home/souadmabrouk/Bureau/Souad/project/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 79, in deco 
     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) 
    pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':" 
    >>> 

は、どのように私はカサンドラコネクタを使用することはできますか?私はそれについて明確な文書を見つけることができませんでした。ちなみに、cassandraクラスターはAWSにあります。

本当に助けてください。

答えて

0
  1. 実行pyspark:
    コードで ./bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_2.11:2.0.2
  2. 、コードで接続設定
    hosts = {"spark.cassandra.connection.host": 'host_dns_or_ip_1,host_dns_or_ip_2,host_dns_or_ip_3'}
  3. で辞書を作成し、接続の設定に
    data_frame = sqlContext.read.format("org.apache.spark.sql.cassandra").options(**hosts).load(keyspace="your_keyspace", table="your_table")
を使用してデータフレームを作成します。
関連する問題