2017-11-13 10 views
0

各州の人口に基づいて最大5つの場所を返そうとしています。私はまた、州ごとの結果をソートしようとしています。それぞれの州の場所は、人口の降順に並べられています。私が現在持っているものは、州の最初の5ヵ所だけで、ではなく、の5つの最も大きな場所はそれぞれの州ごとにです。コードスニペットの下ピッグラテン各グループ属性に適用される制限演算子

-- Groups places by state name. 
group_by_state_name_populated_place_name = 
    GROUP project_using_state_name 
    BY (state::name, place::name); 

-- Counts population for each place in every state. 
count_population_for_each_place_in_every_state = 
    FOREACH group_by_state_name_populated_place_name 
    GENERATE group.state::name AS state_name, 
      group.place::name AS name, 
      COUNT(project_using_state_name.population) AS population; 

-- Orders population in each group found above to enable the use of limit. 
order_groups_of_states_and_population = 
    ORDER count_population_for_each_place_in_every_state 
    BY state_name ASC, population DESC, name ASC; 

-- Limit the top 5 population for each state BUT currently returning just the first 5 tuples of the previous one and not 5 of each state. 
limit_population = 
    LIMIT order_groups_of_states_and_population 5; 
+0

あなたがサンプル入力と期待される出力を追加することができます助けるかもしれませんか? –

答えて

2

inp_data = load 'input_data.csv' using PigStorage(',') AS (state:chararray,place:chararray,population:long); 

req_stats = FOREACH(GROUP inp_data BY state) { 
    ordered = ORDER inp_data BY population DESC; 
    required = LIMIT ordered 5; 
    GENERATE FLATTEN(required); 
}; 

req_stats_ordered = ORDER req_stats BY state, population DESC; 

DUMP req_stats_ordered; 
関連する問題