-files引数を使用してHadoopにファイルを渡す

私はMapReduceプログラムを正しくローカルに実行しています。-files引数を使用してHadoopにファイルを渡す

それはメモリ内のハッシュテーブルを投入するためにマッパークラスのセットアップ（）メソッドで新しい-positions.csvと呼ばれるファイルを使用しています。

public void setup(Context context) throws IOException, InterruptedException { 
     newPositions = new Hashtable<String, Integer>(); 
     File file = new File("new-positions.csv"); 

     Scanner inputStream = new Scanner(file); 
     String line = null; 
     String firstline = inputStream.nextLine(); 
     while(inputStream.hasNext()){ 
      line = inputStream.nextLine(); 
      String[] splitLine = line.split(","); 
      Integer id = Integer.valueOf(splitLine[0].trim()); 
      // String firstname = splitLine[1].trim(); 
      // String surname = splitLine[2].trim(); 
      String[] emails = new String[4]; 
      for (int i = 3; i < 7; i++) { 
       emails[i-3] = splitLine[i].trim(); 
      } 
      for (String email : emails) { 
       if (!email.equals("")) newPositions.put(email, id); 
      } 
      // String position = splitLine[7].trim(); 
      inputStream.close(); 
     } 
    }

Javaプログラムを実行可能なJARに輸出されています。そのJARとfull-positions.csvは両方ともローカルファイルシステム上の同じディレクトリに保存されます。そのディレクトリ内で、我々はターミナルで以下のコマンドを実行しながら、

次に、（我々はまた、新しい-positions.csvのためのフルパス名でそれを試してみました）：

hadoop jar MR2.jar Reader2 -files new-positions.csv InputDataset OutputFolder

それは素晴らしい実行されますが、それを取得するとき

Error: java.io.FileNotFoundException: new-positions.csv (No such file or directory)

このファイルはローカルに存在し、そのディレクトリ内から実行されています。

Hadoop：The Definitive Guide（4th Ed。）のガイドに従っています。私たちのプログラムと議論の構造がどのように異なっているかは分かりません。

Hadoopの設定と関係がありますか？私たちは、ファイルをHDFSにコピーしてそこから実行するなどの回避策があることを知っていますが、なぜこの "-files"引数が期待どおりに機能していないのかを理解する必要があります。

EDIT：以下も問題の源とすることができるドライバクラスからいくつかのコードである：

公共INTラン（文字列[]引数）（引数場合IOException、InterruptedExceptionある、ClassNotFoundExceptionが{ をスロー。長さ！= 5）{ printUsage（this、 ""）; リターン1; }

 Configuration config = getConf(); 

    FileSystem fs = FileSystem.get(config); 

    Job job = Job.getInstance(config); 
    job.setJarByClass(this.getClass()); 
    FileInputFormat.addInputPath(job, new Path(args[3])); 

    // Delete old output if necessary 
    Path outPath = new Path(args[4]); 
    if (fs.exists(outPath)) 
     fs.delete(outPath, true); 

    FileOutputFormat.setOutputPath(job, new Path(args[4])); 

    job.setInputFormatClass(SequenceFileInputFormat.class); 

    job.setOutputKeyClass(NullWritable.class); 
    job.setOutputValueClass(Text.class); 

    job.setMapOutputKeyClass(EdgeWritable.class); 
    job.setMapOutputValueClass(NullWritable.class); 

    job.setMapperClass(MailReaderMapper.class); 
    job.setReducerClass(MailReaderReducer.class); 

    job.setJar("MR2.jar"); 


    boolean status = job.waitForCompletion(true); 
    return status ? 0 : 1; 
} 

public static void main(String[] args) throws Exception { 
    int exitCode = ToolRunner.run(new Reader2(), args); 
    System.exit(exitCode); 
}

出典

2016-04-18 ajrwhite

のは、あなたの "新しい-positions.csvは" フォルダ内に存在していると仮定しましょう：あなたは、パスを修飾する必要が

file:///H:/HDP/new-positions.csv

：ようH:/HDP/、その後、このファイルを渡す必要がありますfile:///で、ローカルファイルシステムパスであることを示します。また、完全修飾パスを渡す必要があります。

これは私にとって完璧に機能します。私はManjunath Ballurはあなたに正しい答えを与えたと思います

yarn jar hadoop-mapreduce-examples-2.4.0.2.1.5.0-2060.jar teragen -files "file:///H:/HDP/hadoop-2.4.0.2.1.5.0-2060/share/hadoop/common/myini.ini" -Dmapreduce.job.maps=10 10737418 /usr/teraout/

出典

2016-04-18 15:54:58

新しいコマンドは次のようになります：hadoop jar MR2.jar Reader2-filesファイル：///home/local/xxx360/FinalProject/new-positions.csv InputDataset OutputFolder ... " new-positions.csv "をJavaプログラムに追加します。私たちのHadoop設定の中に何かがある可能性はありますか？ – ajrwhite

パス全体を二重引用符で囲みます –

まだ動作しません - 問題が私のドライバクラスにあるのだろうかと思います。追加の情報でメインの質問を編集します。 – ajrwhite

、しかし、あなたが渡されたURIは、file:///home/local/xxx360/FinalProject/new-positions.csvはHadoopのから解決できないことがあります。

は例えば、私は以下のようにローカルファイルmyini.iniを渡します作業機械。

そのパスはマシン上の絶対パスのように見えますが、どのマシンにはhomeが含まれていますか？パスにサーバーを追加すると、動作する可能性があります。

また、-fileという単数形を使用すると、Hadoopはシンボリックリンクを作成するのではなく、-filesのようにファイルをコピーするように見えます。

hereを参照してください。

出典

2017-06-24 07:06:09

-files引数を使用してHadoopにファイルを渡す

答えて

関連する問題