2016-12-30 9 views
2

私はツェッペリンに火花コードを実行しようとしていると私はこれを取得: にjava.lang.ClassNotFoundException:クラスcom.hadoop.compression.lzo.LzoCodecはLZOは見つかりませんでした

が見つかりません(1.6.3)

コンファレンススパークシェルでインストールツェッペリン埋め込まれた火花で、私自身から

同じ問題:Debianのから

  • ドッキングウィンドウコンテナ:ジェシー
  • ツェッペリンバージョン:0.6.2(タールからインストールソースから構築しない)
  • CDHバージョン:5.9.0
  • liblzo2-DEVとのHadoop-LZOはSPARK_HOMEとHADOOP_HOMEは以下のように設定されている容器
  • にインストールされていますenv VARSともlibのパスLZO
  • zeppelin-env.shのconf /に:
  • /usr/lib/hadoop/lib/hadoop-lzo-0.4.15-cdh5.9.0.jar compression.codecsプロパティがありますcore-site.xmlおよびmapred-site.xmlの中で

コード:

%spark 
val bankText = sc.textFile("/tmp/bank/bank-full.csv") 

case class Bank(age:Integer, job:String, marital : String, education : String, balance : Integer) 

// split each line, filter out header (starts with "age"), and map it into Bank case class 
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
    s=>Bank(s(0).toInt, 
      s(1).replaceAll("\"", ""), 
      s(2).replaceAll("\"", ""), 
      s(3).replaceAll("\"", ""), 
      s(5).replaceAll("\"", "").toInt 
     ) 
) 

// convert to DataFrame and create temporal table 
bank.toDF().show() 

エラー:linkではなく、私のために働くように見えるか、私は何か間違ったことをした:

bankText: org.apache.spark.rdd.RDD[String] = /tmp/bank/bank-full.csv MapPartitionsRDD[1] at textFile at <console>:29 
defined class Bank 
bank: org.apache.spark.rdd.RDD[Bank] = MapPartitionsRDD[4] at map at <console>:33 
java.lang.RuntimeException: Error in configuring object 
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
    at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:188) 
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190) 
    at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165) 
    at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) 
    at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1500) 
    at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1500) 
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) 
    at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2087) 
    at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1499) 
    at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1506) 
    at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1376) 
    at org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375) 
    at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2100) 
    at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1375) 
    at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1457) 
    at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170) 
    at org.apache.spark.sql.DataFrame.show(DataFrame.scala:350) 
    at org.apache.spark.sql.DataFrame.show(DataFrame.scala:311) 
    at org.apache.spark.sql.DataFrame.show(DataFrame.scala:319) 
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36) 
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41) 
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43) 
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45) 
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:47) 
    at $iwC$$iwC$$iwC.<init>(<console>:49) 
    at $iwC$$iwC.<init>(<console>:51) 
    at $iwC.<init>(<console>:53) 
    at <init>(<console>:55) 
    at .<init>(<console>:59) 
    at .<clinit>(<console>) 
    at .<init>(<console>:7) 
    at .<clinit>(<console>) 
    at $print(<console>) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) 
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) 
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) 
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) 
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) 
    at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38) 
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:953) 
    at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:1168) 
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1111) 
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1104) 
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94) 
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341) 
    at org.apache.zeppelin.scheduler.Job.run(Job.java:176) 
    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.reflect.InvocationTargetException 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
    ... 98 more 
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. 
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135) 
    at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175) 
    at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) 
    ... 103 more 
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found 
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980) 
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) 
    ... 105 more 

私はここにいくつかの情報を見つけました。

おかげ

答えて

1

研究の多くの日後、私は最終的にソースから(v0.6.2タグから)ツェッペリンを構築し、それはすべて同じ設定で動作します!

バイナリパッケージは特定のバージョンのcdhとhadoop(リリースノートではこれに関する情報はありません)だと思います。問題がある場合、バイナリを使用する代わりにzeppelinをビルドすることをおすすめします!

希望すると助かります

関連する問題