SAS - 文字列内の重複した単語を1つの変数全体で削除します。

cat_dという名前の文字列変数には、各観測で複製された単語が含まれています。各観測ごとに重複した単語を削除するにはどうすればよいですか？リンクで次の番組や変数とデータの画像 variable cat_d SAS - 文字列内の重複した単語を1つの変数全体で削除します。

サンプルデータ毎の観察：

MPSJ、Huluのランガット、Huluのランガット、MPAJ、MPSJ、MPAJ、Gombak、MPSJ、MPSJ 、MPSJ、MPKJ、MPAJ、MPAJ、Gombak、MPAJ、MPSJ、Huluのランガット、Gombak

Cheras、Cheras、Cheras、Setapak、Setapak、Setapak、Setapak、Pusatバンダル、Pusatバンダル、クランラマ

クアンタン

MPJBT、MBJB、MBJB、MPPG、MBJB、MBJB、MBJB

期待出力：

MPSJ、Huluのランガット、MPAJ、Gombak、MPKJ

Cheras、Setapak、Pusatバンダル、クラン・ラマ

クアンタン

MPJBT、MBJB、MPPG

data keep; 
i=2; 
length word $500; 
do until (last.cat_d); 
    set want; 
    by cat_d notsorted; 
    string=cat_d; 
    do while(scan(string, i, ',') ^= ''); 
     word = scan(string, i, ','); 
     do j = 1 to i - 1; 
      if word = scan(string, j, ',') then do; 
       start = findw(string, word, ',', findw(string, word, ',', 't') + 1, 't'); 
       string = cat(substr(string, 1, start - 2), substr(string, start + length(word))); 
       leave; 
      end; 
     end; 
     i = i + 1; 
    end; 
end; 
keep cat_d string;run;

出典

2017-11-12 JWW

良い質問をする方法を確認してください。サンプルデータ、期待される出力、これまでに試したことを提供する必要があります。私たちはここであなたの仕事をするのではなく、あなたがそれをやる方法を理解するのを手助けしています。 – Reeza

https://stackoverflow.com/help/how-to-ask – Reeza

申し訳ありませんが、それは良いですか？ @Reeza – JWW

上記のアプローチを有効にするには、TRANWRDを使用して単語を削除する必要がありますが、コンマを処理して、必要に応じて削除する必要があります。後にカンマをもたない最後のものはどうなりますか？

これはまったく異なるアプローチですが、私の意見ではより柔軟です。各エントリは、それ自身の行になるように、各変数

の多くの言葉は、それを分離する方法

カウント。一般的に、この構造は全体的に扱いやすくなります。
データセットをソートして重複しないようにする

これを幅の広いデータセットに戻して、その文を再作成します。

*Create sample data; 

data have; 
    length x $200.; 
    x="MPSJ,Hulu Langat,Hulu Langat, MPAJ, MPSJ, MPAJ, Gombak, MPSJ, MPSJ, MPSJ, MPKJ, MPAJ,MPAJ,Gombak,MPAJ,MPSJ,Hulu Langat,Gombak"; 
    output; 
    x="Cheras,Cheras,Cheras,Setapak,Setapak,Setapak,Setapak,Pusat Bandar,Pusat Bandar,Klang Lama"; 
    output; 
    x="Kuantan"; 
    output; 
    x="MPJBT,MBJB,MBJB,MPPG,MBJB,MBJB,MBJB"; 
    output; 
run; 

*Make it into a long dataset; 

data long; 
    set have; 
    nwords=countw(x); 
    ID=_n_; 

    do i=1 to nwords; 
     words=scan(x, i); 
     output; 
    end; 
run; 

*Sort and remove duplicate values; 

proc sort data=long nodupkey out=long_unique; 
    by ID words; 
run; 

*Transpose to a wide format; 

proc transpose data=long_unique out=wide_unique prefix=word; 
    by id; 
    var words; 
run; 

*Make it back into one variable; 

data want; 
    set wide_unique; 
    by id; 
    sentence=catx(", ", of word:); 
run;

出典

2017-11-12 03:55:52 Reeza

あなたの助けをありがとう、私は数日間立ち往生しました！ – JWW

SAS - 文字列内の重複した単語を1つの変数全体で削除します。

答えて

関連する問題