GridSearchCVスコアリングとgrid_scores_

GridSearchCVのスコアラーの値を取得する方法を理解しようとしています。以下のサンプルコードは、テキストデータに小さなパイプラインを設定します。GridSearchCVスコアリングとgrid_scores_

次に、異なるnグラムにわたってグリッド検索を設定します。

スコアをf1測定を介して行われる：

#setup the pipeline 
tfidf_vec = TfidfVectorizer(analyzer='word', min_df=0.05, max_df=0.95) 
linearsvc = LinearSVC() 
clf = Pipeline([('tfidf_vec', tfidf_vec), ('linearsvc', linearsvc)]) 

# setup the grid search 
parameters = {'tfidf_vec__ngram_range': [(1, 1), (1, 2)]} 
gs_clf = GridSearchCV(clf, parameters, n_jobs=-1, scoring='f1') 
gs_clf = gs_clf.fit(docs_train, y_train)

今私はスコアを印刷することができる：

gs_clf.grid_scores_

[mean: 0.81548, std: 0.01324, params: {'tfidf_vec__ngram_range': (1, 1)}, 
mean: 0.82143, std: 0.00538, params: {'tfidf_vec__ngram_range': (1, 2)}]

プリント

印刷gs_clf.grid_scores_ [0] .cv_validation_scores

array([ 0.83234714, 0.8  , 0.81409002])

それはdocumentationから私には明らかではない。

はgs_clf.grid_scores_です[0]、この場合には（倍ごとに、あたりのF1対策をスコアリングパラメータによって定義されたスコアを持つ配列を.cv_validation_scores倍）？そうでなければ、それは何ですか？
Iの代わりに、別のmetricを選択して、そのようなスコア=「f1_micro」、gs_clf.grid_scores_の各配列として[i]は.cv_validation_scoresは、特定のグリッド検索パラメータを選択するための折り畳み用f1_microメトリックを含有するであろうか？

出典

2016-05-03 tkja

はい、正しく理解しています – maxymoo

私はpandas.DataFrameにgrid_scores_オブジェクトを変換するには、次の関数を書きました。うまくいけば、データフレームビューは、それがより直感的なフォーマットだと、あなたの混乱を解消するのに役立ちます：

def grid_scores_to_df(grid_scores): 
    """ 
    Convert a sklearn.grid_search.GridSearchCV.grid_scores_ attribute to a tidy 
    pandas DataFrame where each row is a hyperparameter-fold combinatination. 
    """ 
    rows = list() 
    for grid_score in grid_scores: 
     for fold, score in enumerate(grid_score.cv_validation_scores): 
      row = grid_score.parameters.copy() 
      row['fold'] = fold 
      row['score'] = score 
      rows.append(row) 
    df = pd.DataFrame(rows) 
    return df

あなたが仕事に、このために、次のインポートを持っている必要があります：import pandas as pd。

出典

2016-09-19 19:13:22

GridSearchCVスコアリングとgrid_scores_

答えて

関連する問題