「マップタスクによって費やされた時間」はHadoopに含まれていますか？

Hadoopジョブが成功した後、さまざまなカウンタのサマリーが表示されます（下の例を参照）。私の質問はTotal time spent by all map tasksカウンターに含まれているものです。具体的には、マッパージョブがノードローカルではない場合、データコピー時間が含まれているかどうかなどです。「マップタスクによって費やされた時間」はHadoopに含まれていますか？

17/01/25 09:06:12 INFO mapreduce.Job: Counters: 49 
     File System Counters 
       FILE: Number of bytes read=2941 
       FILE: Number of bytes written=241959 
       FILE: Number of read operations=0 
       FILE: Number of large read operations=0 
       FILE: Number of write operations=0 
       HDFS: Number of bytes read=3251 
       HDFS: Number of bytes written=2051 
       HDFS: Number of read operations=6 
       HDFS: Number of large read operations=0 
       HDFS: Number of write operations=2 
     Job Counters 
       Launched map tasks=1 
       Launched reduce tasks=1 
       Data-local map tasks=1 
       Total time spent by all maps in occupied slots (ms)=23168 
       Total time spent by all reduces in occupied slots (ms)=4957 
       Total time spent by all map tasks (ms)=5792 
       Total time spent by all reduce tasks (ms)=4957 
       Total vcore-milliseconds taken by all map tasks=5792 
       Total vcore-milliseconds taken by all reduce tasks=4957 
       Total megabyte-milliseconds taken by all map tasks=23724032 
       Total megabyte-milliseconds taken by all reduce tasks=5075968 
     Map-Reduce Framework 
       Map input records=9 
       Map output records=462 
       Map output bytes=4986 
       Map output materialized bytes=2941 
       Input split bytes=109 
       Combine input records=462 
       Combine output records=221 
       Reduce input groups=221 
       Reduce shuffle bytes=2941 
       Reduce input records=221 
       Reduce output records=221 
       Spilled Records=442 
       Shuffled Maps =1 
       Failed Shuffles=0 
       Merged Map outputs=1 
       GC time elapsed (ms)=84 
       CPU time spent (ms)=2090 
       Physical memory (bytes) snapshot=471179264 
       Virtual memory (bytes) snapshot=4508950528 
       Total committed heap usage (bytes)=326631424 
     Shuffle Errors 
       BAD_ID=0 
       CONNECTION=0 
       IO_ERROR=0 
       WRONG_LENGTH=0 
       WRONG_MAP=0 
       WRONG_REDUCE=0 
     File Input Format Counters 
       Bytes Read=3142 
     File Output Format Counters 
       Bytes Written=2051

出典

2017-01-25 mcserep

私は、データコピー時間がメトリックTotal time spent by all map tasksに含まれていることを信じています。

まず、サーバ側のコード（主にリソース管理に関連する）をチェックすると、定数（参照しているメトリックに相当）がTaskAttempImplクラス内で更新されていることがわかりますタスクの試行のタスクの試行launchTimeは、コンテナが起動して実行を開始しようとしているときに設定されます（ソースコードの知識から、この時点でコンポーネントを移動しないと、スプリットメタデータのみが渡されます）。

今度は、InputFormatがInputStreamを開きます。これは、Mapperが処理を開始する必要があるデータを取得する責任があります（この時点で、ストリームにアタッチできるファイルシステムは異なりますがDistributedFileSystem）。あなたが持っているMapTask.runNewMapper(...)方法で実行されるステップを確認することができます。

input.initialize(split, mapperContext); 
mapper.run(mapperContext);

（私はHadoopの2.6にしています）詳細な回答のため

出典

2017-01-26 13:39:56 Serhiy

感謝を。 – mcserep

「マップタスクによって費やされた時間」はHadoopに含まれていますか？

答えて

関連する問題