2016-11-03 13 views
0

私はlocation,sentimentおよびbrandのフィールドを持つデータを持っています。私はブランドの場所でポジティブ、ネガティブ、ニュートラルの数を計算したいと思います。ブタ:特定の行のみをカウントする

a1 = GROUP x BY (location, brand); 
a2 = FOREACH a1 GENERATE FLATTEN(group) AS (location, brand), COUNT(x.sentiment=="positive"?1:0) AS positive_count, COUNT(x.sentiment=="negative"?1:0) AS negative_count, COUNT(x.sentiment=="neutral:?1:0) as neutral_count; 

をしかし、私は、私はすべての3つによりグループ化しようとしたUnexpected character '"'

を言って構文エラーを取得しています:xを想定し

はデータを持って、私がやったlocation, sentiment and brandが、私のようなだけで、全体の数を取得しています:

{location: "newyork", brand: "pampers", sentiment = "positive", count = 10} 
{location: "newyork", brand: "pampers", sentiment = "negative", count = 2} 
{location: "newyork", brand: "pampers", sentiment = "neutral", count = 20} 

私はpositives_count、negatives_count、およびneutrals_countのために別々のフィールドを必要とします。このようなもの:

{location: "newyork", brand: "pampers", positive_count = 10, negative_count = 2, neutral_count = 20} 
{location: "london", brand: "pampers", positive_count = 12, negative_count = 0, neutral_count = 35} 
{location: "newyork", brand: "huggies", positive_count = 40, negative_count = 6, neutral_count = 10} 

いくつかの人が私を助けてくれますか?

答えて

0

使用単一引用符

a1 = GROUP x BY (location, brand); 
a2 = FOREACH a1 GENERATE FLATTEN(group) AS (location, brand), 
        COUNT(x.sentiment=='positive'?1:0) AS positive_count, 
        COUNT(x.sentiment=='negative'?1:0) AS negative_count, 
        COUNT(x.sentiment=='neutral'?1:0) as neutral_count; 

EDIT

newyork pampers positive 
newyork pampers positive 
newyork pampers negative 
newyork pampers positive 
newyork pampers positive 
newyork pampers neutral 
newyork pampers positive 
newyork pampers negative 
newyork pampers neutral 
newyork pampers positive 
newyork pampers positive 
newyork pampers neutral 

スクリプト

B = GROUP A BY (location,brand); 
C = FOREACH B { 
        A1 = FILTER A BY sentiment matches 'positive'; 
        A2 = FILTER A BY sentiment matches 'negative'; 
        A3 = FILTER A BY sentiment matches 'neutral'; 
        GENERATE FLATTEN(group) as (location,brand),COUNT(A1),COUNT(A2),COUNT(A3); 
       }; 

出力

enter image description here

+0

二重引用符を一重引用符で置き換えたようです。私はそれを試みたが、私のために働かなかった。構文エラーが発生しました。とにかく、私はそれを理解した。ありがとう! – kskp

+0

@kskp構文エラーは何ですか? –

+0

'mismatched input '=='予想RIGHT_PAREN' – kskp

0

Iは、元のデータが含まれて別名を濾過し、各エントリ数をカウントし、それらすべてに参加しました。

p = FILTER y BY (sentiment == 'positive'); 
p1 = GROUP p BY (location, brand, avl_author_type); 
p2 = FOREACH p1 GENERATE FLATTEN(group) AS (location, brand, avl_author_type), COUNT(p) AS positive_counts; 

n = FILTER y BY (sentiment == 'negative'); 
n1 = GROUP n BY (location, brand, avl_author_type); 
n2 = FOREACH n1 GENERATE FLATTEN(group) AS (location, brand, avl_author_type), COUNT(n) AS negative_counts; 

ne = FILTER y BY (sentiment == 'neutral'); 
ne1 = GROUP ne BY (location, brand, avl_author_type); 
ne2 = FOREACH ne1 GENERATE FLATTEN(group) AS (location, brand, avl_author_type), COUNT(ne) AS neutral_counts; 

j1 = JOIN p2 BY (location, brand, avl_author_type) LEFT OUTER, n2 BY (location, brand, avl_author_type); 
j2 = FOREACH j1 GENERATE p2::location as location, p2::brand as brand, p2::avl_author_type as avl_author_type, p2::positive_counts as positive_counts, n2::negative_counts as negative_counts; 

j3 = JOIN j2 BY (location, brand, avl_author_type) LEFT OUTER, ne2 BY (location, brand, avl_author_type); 
j4 = FOREACH j3 GENERATE j2::location as location, j2::brand as brand, j2::avl_author_type as avl_author_type, j2::positive_counts as positive, j2::negative_counts as negative, ne2::neutral_counts as neutral; 

種類は長くても効果があります。

関連する問題