2017-11-28 11 views
0

の問題(これは私がこの問題に持っていた以前の問題に関するいた議論のフォローアップである)Hadoopクラスタは、マップを実行していないジョブを減らす - スケジューラ

私はthese次の小さなHadoopクラスタを設定しますHadoopバージョン2.7.4を使用しています。クラスタは正常に動作しているようですが、私はmapreduceジョブを実行できません。特に、

17/11/27 16:35:21 INFO client.RMProxy: Connecting to ResourceManager at 
ec2-yyy.eu-central- 
1.compute.amazonaws.com/xxx:8032 
Running 0 maps. 

Job started: Mon Nov 27 16:35:22 UTC 2017 

17/11/27 16:35:22 INFO client.RMProxy: Connecting to ResourceManager at 
ec2-yyy.eu-central- 
1.compute.amazonaws.com/xxx:8032 


17/11/27 16:35:22 INFO mapreduce.JobSubmitter: number of splits:0 

17/11/27 16:35:22 INFO mapreduce.JobSubmitter: Submitting tokens for 
job: job_1511799491035_0006 

17/11/27 16:35:22 INFO impl.YarnClientImpl: Submitted application 
application_1511799491035_0006 

17/11/27 16:35:22 INFO mapreduce.Job: The url to track the job: 
http://ec2-yyy.eu-central- 
1.compute.amazonaws.com:8088/proxy/application_1511799491035_0006/ 

17/11/27 16:35:22 INFO mapreduce.Job: Running job: 
job_1511799491035_0006 

、決してこの状態を過ぎて取得する以下の

$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar randomwriter outdenter code here 

ジョブプリントをしようとしたとき。ジョブトラッカーで

、それは私が、私は容量のスケジューラに問題があることを示唆している

2017-11-27 13:50:29,202 INFO org.apache.hadoop.conf.Configuration: found resource capacity-scheduler.xml at file:/usr/local/hadoop/etc/hadoop/capacity-scheduler.xml 
2017-11-27 13:50:29,252 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root is undefined 
2017-11-27 13:50:29,252 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root is undefined 
2017-11-27 13:50:29,256 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: root, capacity=1.0, asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING, acls=ADMINISTER_QUEUE:*SUBMIT_APP:*, labels=*, reservationsContinueLooking=true 
2017-11-27 13:50:29,256 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Initialized parent-queue root name=root, fullname=root 
2017-11-27 13:50:29,265 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root.default is undefined 
2017-11-27 13:50:29,265 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root.default is undefined 

を発見したログファイルに見えた

ACCEPTED: waiting for AM container to be allocated, launched and 
register with RM. 

言います。次のようにファイルcapacity-scheduler.xmlに見える:

<configuration> 

    <property> 
    <name>yarn.scheduler.capacity.maximum-applications</name> 
    <value>10000</value> 
    <description> 
     Maximum number of applications that can be pending and running. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name> 
    <value>0.1</value> 
    <description> 
     Maximum percent of resources in the cluster which can be used to run 
     application masters i.e. controls number of concurrent running 
     applications. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.resource-calculator</name> 
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> 
    <description> 
     The ResourceCalculator implementation to be used to compare 
     Resources in the scheduler. 
     The default i.e. DefaultResourceCalculator only uses Memory while 
     DominantResourceCalculator uses dominant-resource to compare 
     multi-dimensional resources such as Memory, CPU etc. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.root.queues</name> 
    <value>default</value> 
    <description> 
     The queues at the this level (root is the root queue). 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.root.default.capacity</name> 
    <value>100</value> 
    <description>Default queue target capacity.</description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name> 
    <value>1</value> 
    <description> 
     Default queue user limit a percentage from 0.0 to 1.0. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name> 
    <value>100</value> 
    <description> 
     The maximum capacity of the default queue. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.root.default.state</name> 
    <value>RUNNING</value> 
    <description> 
     The state of the default queue. State can be one of RUNNING or STOPPED. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name> 
    <value>*</value> 
    <description> 
     The ACL of who can submit jobs to the default queue. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name> 
    <value>*</value> 
    <description> 
     The ACL of who can administer jobs on the default queue. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.node-locality-delay</name> 
    <value>40</value> 
    <description> 
     Number of missed scheduling opportunities after which the CapacityScheduler 
     attempts to schedule rack-local containers. 
     Typically this should be set to number of nodes in the cluster, By default is setting 
     approximately number of nodes in one rack which is 40. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.queue-mappings</name> 
    <value></value> 
    <description> 
     A list of mappings that will be used to assign jobs to queues 
     The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]* 
     Typically this list will be used to map users to queues, 
     for example, u:%user:%user maps all users to queues with the same name 
     as the user. 
    </description> 
    </property> 

    <property> 
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name> 
    <value>false</value> 
    <description> 
     If a queue mapping is present, will it override the value specified 
     by the user? This can be used by administrators to place jobs in queues 
     that are different than the one specified by the user. 
     The default is false. 
    </description> 
    </property> 

</configuration> 

私はこれをアプローチする方法上の任意のヒントに感謝だろう?。

おかげ C14

+0

OKです。ありがとう。私はファイルの内容を追加しました。 – clog14

答えて

0

クラスタ構成とすべてのものの罰金が、それはジョブの実行に来るとき、t2.microインスタンスが提供するRAMは、MapReduceジョブを実行するのに十分ではないので、より良いクラスタ作成のためのより大きなインスタンスを使用し、ジョブ実行

+1

こんにちは、私はt2.micoインスタンスを使用していませんが、m4.xlargeです。私は1人の作業者あたりわずか20GBのハードディスクを選択しましたが、これは問題になる可能性がありますか? – clog14

関連する問題