XGBoostモデルのGridSearchCVでエラーが発生する

私はPythonでXGBoostクラシファイアを作った。私は少し大きなサイズのデータセットを使用し、このXGBoostモデルのGridSearchCVでエラーが発生する

[Errno 28] No space left on device

のようなエラーが出るの検索を実行しているとき、私はこの

grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold) 
grid_result = grid_search.fit(X, Y) 

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_)) 

means = grid_result.cv_results_['mean_test_score'] 
stds = grid_result.cv_results_['std_test_score'] 
params = grid_result.cv_results_['params'] 

for mean, stdev, param in zip(means, stds, params): 
    print("%f (%f) with: %r" % (mean, stdev, param))

のような最適なパラメータを見つけることGridSearchを実行しようとしました。ここで、 X.shape = (38932, 1002) Y.shape= (38932,)

問題点は何ですか？これを解決する方法？

私のマシンではデータセットが大きすぎますか？もしそうなら、私はこのデータセットでGridSearchをプリフォームするために何ができるのですか？

出典

2017-10-15 Sreeram TP

私は – sgDysregulation

データサンプルと形状またはリンクを提供することにより、いずれかのデータセットの記述を含めてください –

これは似たような問題です：https://stackoverflow.com/a/6999259/1577947 – Jarad

エラーは、共有メモリが不足して、すなわちn_jobsが.Hereが

をxgboost使用して実施例である。この問題を解決します使用するスレッド数をkfoldsの数を増加および/または調整すること、それは可能性がありますことを示しています

import xgboost as xgb 
from sklearn.model_selection import GridSearchCV 
from sklearn import datasets 

clf = xgb.XGBClassifier() 
parameters = { 
    'n_estimators': [100, 250, 500], 
    'max_depth': [6, 9, 12], 
    'subsample': [0.9, 1.0], 
    'colsample_bytree': [0.9, 1.0], 
} 
bsn = datasets.load_iris() 
X, Y = bsn.data, bsn.target 
grid = GridSearchCV(clf, 
        parameters, n_jobs=4, 
        scoring="neg_log_loss", 
        cv=3) 

grid.fit(X, Y) 
print("Best: %f using %s" % (grid.best_score_, grid.best_params_)) 

means = grid.cv_results_['mean_test_score'] 
stds = grid.cv_results_['std_test_score'] 
params = grid.cv_results_['params'] 

for mean, stdev, param in zip(means, stds, params): 
    print("%f (%f) with: %r" % (mean, stdev, param))

出力

Best: -0.121569 using {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 100, 'subsample': 1.0} 
-0.126334 (0.080193) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 100, 'subsample': 0.9} 
-0.121569 (0.081561) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 100, 'subsample': 1.0} 
-0.139359 (0.075462) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 250, 'subsample': 0.9} 
-0.131887 (0.076174) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 250, 'subsample': 1.0} 
-0.148302 (0.074890) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 500, 'subsample': 0.9} 
-0.135973 (0.076167) with: {'colsample_bytree': 0.9, 'max_depth': 6, 'n_estimators': 500, 'subsample': 1.0} 
-0.126334 (0.080193) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 100, 'subsample': 0.9} 
-0.121569 (0.081561) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 100, 'subsample': 1.0} 
-0.139359 (0.075462) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 250, 'subsample': 0.9} 
-0.131887 (0.076174) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 250, 'subsample': 1.0} 
-0.148302 (0.074890) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 500, 'subsample': 0.9} 
-0.135973 (0.076167) with: {'colsample_bytree': 0.9, 'max_depth': 9, 'n_estimators': 500, 'subsample': 1.0} 
-0.126334 (0.080193) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 100, 'subsample': 0.9} 
-0.121569 (0.081561) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 100, 'subsample': 1.0} 
-0.139359 (0.075462) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 250, 'subsample': 0.9} 
-0.131887 (0.076174) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 250, 'subsample': 1.0} 
-0.148302 (0.074890) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 500, 'subsample': 0.9} 
-0.135973 (0.076167) with: {'colsample_bytree': 0.9, 'max_depth': 12, 'n_estimators': 500, 'subsample': 1.0} 
-0.132745 (0.080433) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 100, 'subsample': 0.9} 
-0.127030 (0.077692) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 100, 'subsample': 1.0} 
-0.146143 (0.077623) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 250, 'subsample': 0.9} 
-0.140400 (0.074645) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 250, 'subsample': 1.0} 
-0.153624 (0.077594) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 500, 'subsample': 0.9} 
-0.143833 (0.073645) with: {'colsample_bytree': 1.0, 'max_depth': 6, 'n_estimators': 500, 'subsample': 1.0} 
-0.132745 (0.080433) with: {'colsample_bytree': 1.0, 'max_depth': 9, ...

出典

2017-10-15 17:36:28 sgDysregulation

私はマシン上でCVSearchを正常に実行しました。私はこのデータセットの問題に直面しています。 –

'kFold'を使わずに試してみましょう。どのように行ったのか教えてください。 –

gridsearchのverbosityを有効にして、特定のパラメータ値が問題を引き起こしているかどうかを確認することもできます。 – sgDysregulation

XGBoostモデルのGridSearchCVでエラーが発生する

答えて

関連する問題