2017-03-03 4 views
2

同様の質問に対する答えは、インスタンスSVR()のパラメータ値を変更することを示唆していますが、それらを処理する方法はわかりません。Python(sklearn) - SVRのすべてのテストタプルに対して同じ予測が得られるのはなぜですか?

ここで私が使用していたコードです:

import json 
import numpy as np 
from sklearn.svm import SVR 

f = open('training_data.txt','r') 
data = json.loads(f.read()) 
f.close() 

f = open('predict_py.txt','r') 
data1 = json.loads(f.read()) 
f.close() 

features = [] 
response = [] 
predict = [] 


for row in data: 
    a = [] 
    a.append(row['star_power']) 
    a.append(row['view_count']) 
    a.append(row['like_count']) 
    a.append(row['dislike_count']) 
    a.append(row['sentiment_score']) 
    a.append(row['holidays']) 
    a.append(row['clashes']) 

    features.append(a) 

    response.append(row['collection']) 


for row in data1: 
    a = [] 
    a.append(row['star_power']) 
    a.append(row['view_count']) 
    a.append(row['like_count']) 
    a.append(row['dislike_count']) 
    a.append(row['sentiment_score']) 
    a.append(row['holidays']) 
    a.append(row['clashes']) 

    predict.append(a) 



X = np.array(features) 


Y = np.array(response) 


predict = np.array(predict) 
predict = predict.astype(float) 
X = X.astype(float) 
Y = Y.astype(float) 

svm = SVR() 
svm.fit(X,Y) 
print("svm prediction") 
svm_pred = svm.predict(predict) 
print(svm_pred) 

Here're私はコード

training_data.txt

predict_py.txt

で使用している2つのテキストファイルへのリンクは、出力:

要求に応じて2つのテキストファイルのサンプルを追加する210

1)training_data.txt:

[{"star_power":"1300","view_count":"50602729","like_count":"348059","dislike_count":"31748","holidays":"1","clashes":"0","sentiment_score":"0.32938596491228","collection":"383"},{"star_power":"1700","view_count":"36012808","like_count":"205694","dislike_count":"20130","holidays":"0","clashes":"0","sentiment_score":"0.1130303030303","collection":"300.68"},{"star_power":"0","view_count":"23892902","like_count":"86380","dislike_count":"4426","holidays":"0","clashes":"0","sentiment_score":"0.16004079254079","collection":"188.72"},{"star_power":"0","view_count":"27177685","like_count":"374671","dislike_count":"10372","holidays":"0","clashes":"0","sentiment_score":"0.16032407407407","collection":"132.85"},{"star_power":"500","view_count":"7481738","like_count":"42734","dislike_count":"1885","holidays":"0","clashes":"0","sentiment_score":"0.38622493734336","collection":"128.45"},{"star_power":"400","view_count":"16895259","like_count":"99158","dislike_count":"4188","holidays":"0","clashes":"0","sentiment_score":"0.22791203703704","collection":"127.48"},{"star_power":"200","view_count":"16646480","like_count":"63472","dislike_count":"13652","holidays":"1","clashes":"1","sentiment_score":"0.16873480902778","collection":"112.14"},{"star_power":"400","view_count":"18717042","like_count":"67497","dislike_count":"14165","holidays":"0","clashes":"0","sentiment_score":"0.30881006493506","collection":"109.14"}] 

2)

[{"star_power":"0","view_count":"3717403","like_count":"13399","dislike_count":"909","sentiment_score":"0.154167","holidays":"0","clashes":"0"},{"star_power":"0","view_count":"1640896","like_count":"2923","dislike_count":"328","sentiment_score":"0.109112","holidays":"0","clashes":"0"},{"star_power":"100","view_count":"14723084","like_count":"95088","dislike_count":"9816","sentiment_score":"0.352344","holidays":"0","clashes":"0"},{"star_power":"0","view_count":"584922","like_count":"4032","dislike_count":"212","sentiment_score":"0.3495","holidays":"0","clashes":"0"},{"star_power":"0","view_count":"14826843","like_count":"94788","dislike_count":"4169","sentiment_score":"0.208472","holidays":"0","clashes":"0"},{"star_power":"0","view_count":"1866184","like_count":"2750","dislike_count":"904","sentiment_score":"0.1275","holidays":"0","clashes":"0"},{"star_power":"200","view_count":"22006916","like_count":"184780","dislike_count":"13796","sentiment_score":"0.183611","holidays":"0","clashes":"0"},{"star_power":"0","view_count":"2645992","like_count":"4698","dislike_count":"1874","sentiment_score":"0.185487","holidays":"0","clashes":"0"},{"star_power":"0","view_count":"13886030","like_count":"116879","dislike_count":"6608","sentiment_score":"0.243479","holidays":"0","clashes":"0"},{"star_power":"0","view_count":"3102123","like_count":"36790","dislike_count":"769","sentiment_score":"0.065651","holidays":"0","clashes":"0"},{"star_power":"300","view_count":"16469439","like_count":"110054","dislike_count":"17892","sentiment_score":"0.178432","holidays":"0","clashes":"0"},{"star_power":"0","view_count":"6353017","like_count":"81236","dislike_count":"2154","sentiment_score":"0.0480556","holidays":"0","clashes":"0"},{"star_power":"0","view_count":"8679597","like_count":"89531","dislike_count":"6923","sentiment_score":"0.152083","holidays":"0","clashes":"0"}] 

に任意の提案をpredict_py.txt? ありがとうございます。

+0

'X'、' y'、 'predict'を印刷してみてください。また、あなたのリンクをロードすることができません –

+0

私はそれらを印刷しました。彼らはすべて正しいです。リンクも正常に動作しています。 – jatin

+0

ここに印刷してください。少なくともいくつかのサンプル。 –

答えて

3

コードを変更してデータを標準化します。

from sklearn.preprocessing import RobustScaler 
rbX = RobustScaler() 
X = rbX.fit_transform(X) 

rbY = RobustScaler() 
Y = rbY.fit_transform(Y) 

はその後だけRBXに応じpredict変換、予測の時点でfit()

svm = SVR() 
svm.fit(X,Y) 

を行います。

svm_pred = svm.predict(rbX.transform(predict)) 

ここでsvm_predは標準化された形式です。予測されたYを正しい形式にしたいので、svm_predをrbYに従って逆変換します。

svm_pred = rbY.inverse_transform(svm_pred) 

次に、svm_predを印刷します。それは満足のいく結果を与えるでしょう。

+0

それは働いた。しかし、私はそれについてもっと知りたいです。私は線形reg、ロジスティックreg、KNNなどのアルゴリズムをいくつか使用しました。なぜ私はこれをSVRで使うのですか? – jatin

+1

詳細はこちらを参照してください:http://scikit-learn.org/stable/modules/svm.html#tips-on-practical-use –

+0

scikit-learnはSVCとSVRの 'libsvm'を実装しています。それらはスケール不変ではなく、データが所定の範囲に正規化されたときに正しく動作します。他のアルゴリズムはこれによって影響を受けないかもしれません。 –

関連する問題