2017-02-09 5 views
0

ちょうど明確化を取得したい、--proxy-ユーザーでのHadoopのKerberosに火花提出一緒に?火花提出--keytab --principal & & --proxy-ユーザパラメータが共存できる場合--keytabと--principalパラメータ

実際のビジネスユーザーとしてジョブを提出する必要がありますが、ユーザーにhadoop kdcのプリンシパルがありません。

プロキシユーザーとケルベロスのプリンシパルを一緒に使うときはいつも例外が発生します。

17/02/09 13:51:43 INFO DFSClient: Created HDFS_DELEGATION_TOKEN token 379 for atlas on 10.12.118.92:8020 
Exception in thread "main" java.io.IOException: java.lang.reflect.UndeclaredThrowableException 
     at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:888) 
     at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:8 
     at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2243) 
     at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) 
     at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) 
     at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) 
     at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206) 
     at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) 
     at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
     at scala.Option.getOrElse(Option.scala:120) 
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
     at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
     at scala.Option.getOrElse(Option.scala:120) 
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
     at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) 
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) 
     at scala.Option.getOrElse(Option.scala:120) 
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 
     at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1293) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) 
     at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) 
     at org.apache.spark.rdd.RDD.take(RDD.scala:1288) 
     at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1328) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) 
     at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) 
     at org.apache.spark.rdd.RDD.first(RDD.scala:1327) 
     at com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:269) 
     at com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:265) 
     at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:242) 
     at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:74) 
     at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:171) 
     at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:44) 
     at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) 
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) 
     at org.sandbox.Main$.main(Main.scala:39) 
     at org.sandbox.Main.main(Main.scala) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:497) 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) 
     at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163) 
     at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:161) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:422) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:161) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.reflect.UndeclaredThrowableException 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672) 
     at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:870) 
     ... 57 more 
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, status: 403, message: Forbidde 
     at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:274) 
     at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77) 
     at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128 
     at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:214) 
  1. プロキシ・ユーザーおよび主要なパラメータが一緒に共存することができない場合は、あなたたちはそのことについてのドキュメントがありますか?
  2. kerberos hadoop環境でのproxy-userパラメータの実際の使用例は何ですか?
+0

Hadoopの「プロキシユーザー」の典型的な例は、「oozie」(ジョブスケジューラ)と「hue」(ゲートウェイUI)です。パスワードを要求せずにジョブを起動できます。あなたが繋がれていなければ、Oozieの場合。 –

答えて

0

1)--proxy-user--principalcan't be passed together to spark-submitと同時に、ただし、kerberosユーザーとして初期化し、プロキシユーザーの下でspark-jobを起動することができます。 kinit -kt USER.keytab USER && spark-submit --proxy-user PROXY-USER ※sparkをハイブ+で使用すると機能しません。hadoop.proxyuser.USER.{hosts,groups}が正しく設定されていることを確認してください。

2)A superuser with username ‘super’ wants to submit job and access hdfs on behalf of a user joe. The superuser has kerberos credentials but user joe doesn’t have any. The tasks are required to run as user joe and any file accesses on namenode are required to be done as user joe. It is required that user joe can connect to the namenode or job tracker on a connection authenticated with super’s kerberos credentials. In other words super is impersonating the user joe.

+0

私はまだkinitを実行してspark-submit中にプリンシパルとキータブを削除しても、同じ例外が発生します。何か案が ? – Adelave

0

私は、--proxy-ユーザーを使用することができ--principalだと火花を提出使用して一緒に--keytab。上記の問題は、DELEGATIONTOKENのKMSレンジャーへの許可要求によるものです。

「カスタムkmsサイト」に次のエントリを追加して動作させています。

hadoop.kms.proxyuser.xxx.users=* 
hadoop.kms.proxyuser.xxx.hosts=* 
関連する問題