2017-01-09 7 views
0

私はハイブUDFをjavaで書いていますが、私はpyspark 2.0.0で使用しようとしています。以下EMR へのステップ 1.コピーしたjarファイル2. pyspark EMR 5.xでJavaで書かれたハイブUDFを実行したときにエラーが発生する

pyspark --jars ip-udf-0.0.1-SNAPSHOT-jar-with-dependencies-latest.jar 

以下

3のようなpysparkジョブを開始しているUDF

from pyspark.sql import SparkSession 
from pyspark.sql import HiveContext 
sc = spark.sparkContext 
sqlContext = HiveContext(sc) 
sqlContext.sql("create temporary function ip_map as 'com.mediaiq.hive.IPMappingUDF'") 

以下のコードへのアクセスを使用し、私は以下のエラーが出ます:

py4j.protocol.Py4JJavaError: An error occurred while calling o43.sql. : java.lang.NoSuchMethodError: org.apache.hadoop.hive.conf.HiveConf.getTimeVar(Lorg/apache/hadoop/hive/conf/HiveConf$ConfVars;Ljava/util/concurrent/TimeUnit;)J at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:76) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:98) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340) at org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:189) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263) at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39) at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38) at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46) at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45) at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50) at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48) at org.apache.spark.sql.hive.HiveSessionState$$anon$1.(HiveSessionState.scala:63) at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63) at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)

+0

私は以下のリンクを調べましたが、あまり助けにはなりませんでした。[http://stackoverflow.com/questions/38491483/how-to-call-a-hive-udf-written-in-java-using-pyspark-ハイブ・コンテキストから] – braj259

答えて

0

異なるバージョンのHiveでUDFをビルドした可能性があります。 pom.xmlに同じバージョンのHiveを指定して、UDFを含むjarファイルを作成してください。たとえば、this previous answerを参照してください。

関連する問題