データセット内のサンプルのランダムなインデックスを選択

私は31サンプルのデータセットをpythonに入れました。 30トレーニングサンプルと1サンプルのテストサンプルでランダムに30回データセットを分割したいのですが、どうすればいいですか？データセット内のサンプルのランダムなインデックスを選択

は、今私はちょうど訓練のための最初の30と、このようなテストのための最後の一つを用いて分割：

training_this_roundを=トレーニング[0：30]

testing_this_round =訓練[30:31]

ランダムに行列の行を選択するにはどうすればよいですか？トレーニングは、私のすべての初期データセットを含む変数です。

出典

2016-12-19 konstantin

私はrandom.shuffleが好きです。

（私たちは、彼らが整数だと言うだろう）さんが31個のサンプルでダミーのデータセットを作成してみましょう：

import random 
# copy training to preserve the order of the original dataset 
this_round = training[:] 
# permute the elements 
random.shuffle(this_round) 
# separate into training and test 
training_this_round = this_round[:30] 
testing_this_round = this_round[30:31]

：

training = range(31)

は今、私たちは、これが2つのランダムなサブグループに設定分割するshuffleを使用することができます

これは、サンプルをランダムな順序で配置し（カードのデッキをシャッフルするのと同じように）、次にテストのために一番上のカードを取り出し、残りをトレーニングに使用します。私が気に入っていることは、他の種類の分割（例えば、上位3枚のカードをテストセットに入れ、次に別の5枚を検証セットに入れて、残りをトレーニングに使うなど）にまで及ぶことです。

# pick an index into training at random 
select = random.randint(0, len(training) - 1) 
# test set is a single sample (not a list) 
testing_this_round = training[select] 
# training set is all elements except the one chosen for testing 
training_this_round = [x for (i, x) in enumerate(training) if i != select]

出典

2016-12-19 17:11:15 wildwilhelm

おかげで、しかし、どのように私はtesting_roundのインデックスを維持することができますか？ – konstantin

また、訓練は私の最初のデータセットの名前なので、インデックス変数訓練は誤解を招くものです。 – konstantin

最後の編集は、トレーニングラウンドの「インデックスを保持する」のに役立ちますか？その場合、 'select'の値を思い出すことができます。また、私はインデックス変数として 'training'を使用していません、私はそれをデータセットにしようとしています（答えでは、それは整数値の束からなるダミーのデータセットです）。 – wildwilhelm

：

あなただけのテストのための単一のサンプルを使用しているので、それがランダムにカード（サンプル）を選んで、デッキからそれを除去することで、周りのものに他の方法を行うことも簡単です

numpyのようなサードパーティ製のアレイツールキットを使用すると、エラーなく簡単に管理でき、scikit-learnなどのサードパーティの機械学習パッケージでは、クロスバリデーションの問題に対するより高いレベルのソリューションが既に用意されています。しかし、我々は額面であなたの質問を取り、手ですべてを行うと徒歩で、ここでは動作するはずなアプローチだと仮定すると：返信用

import random 

indices = list(range(len(dataset))) 
random.shuffle(indices) # shuffle just once before folding: this ensures we don't re-use any test fold indices 

validation_results = [] 
leave_n_out = 1 
for test_start in range(0, len(indices), leave_n_out): # work through the different folds of the cross-validation 
    test_stop = test_start + leave_n_out 

    testing_this_round = [dataset[i] for i in indices[test_start:test_stop]] 
    training_this_round = [dataset[i] for i in indices[:test_start] + indices[test_stop:]] 

    model = train(training_this_round) # whatever that involves 
    validation_results.append(test(model, testing_this_round)) # whatever that involves

出典

2016-12-19 17:38:21 jez

データセット内のサンプルのランダムなインデックスを選択

答えて

関連する問題