ランダムなフォレストを実行できるようにデータを操作するにはどうすればよいですか？

行列の束にランダムなフォレストを訓練したいと思います（最初のリンクは例です）。私はそれらを "g"または "b"（良いか悪い、aかb、1か0、それは問題ではない）として分類したいと思う。ランダムなフォレストを実行できるようにデータを操作するにはどうすればよいですか？

私はスクリプトrandfore.pyを呼び出しました。私は現在10の例を使用していますが、実際にこれを取得して実行すると、より大きなデータセットを使用します。ここで

はコードです：ここでは

# -*- coding: utf-8 -*- 
import numpy as np 
import pandas as pd 
import os 

import sklearn 
from sklearn.tree import DecisionTreeClassifier 
from sklearn.ensemble import RandomForestClassifier 

working_dir = os.getcwd() # Grabs the working directory 

directory = working_dir+"/fakesourcestuff/" ## The actual directory where the files are located 

sources = list() # Just sets up a list here which is going to become the input for the random forest 

for i in range(10): 
    cutoutfile = pd.read_csv(directory+ "image2_with_fake_geotran_subtracted_corrected_cutout_" + str(i) +".dat", dtype=object) ## Where we get the input data for the random forest from 
    sources.append(cutoutfile) # add it to our sources list 

targets = pd.read_csv(directory + "faketargets.dat",sep='\n',header=None, dtype=object) # Reads in our target data... either "g" or "b" (Good or bad) 


sources = pd.DataFrame(sources) ## I convert the list to a dataframe to avoid the "ValueError: cannot copy sequence with size 99 to array axis with dimension 1" error. Necessary? 

# Training sets 
X_train = sources[:8] # Inputs 
y_train = targets[:8] # Targets 

# Random Forest 
rf = RandomForestClassifier(n_estimators=10) 
rf_fit = rf.fit(X_train, y_train)

は、現在のエラー出力です：

Traceback (most recent call last): 
    File "randfore.py", line 31, in <module> 
    rf_fit = rf.fit(X_train, y_train) 
    File "/home/ithil/anaconda2/envs/iraf27/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 247, in fit 
    X = check_array(X, accept_sparse="csc", dtype=DTYPE) 
    File "/home/ithil/anaconda2/envs/iraf27/lib/python2.7/site-packages/sklearn/utils/validation.py", line 382, in check_array 
    array = np.array(array, dtype=dtype, order=order, copy=copy) 
ValueError: setting an array element with a sequence.

私はDTYPE =オブジェクトを作ってみましたが、それは助けていません。私はこの仕事をするためにどのような操作をする必要があるのか分かりません。

私はソースに追加したファイルは、（それは基本的に大行列です）数字だけが、数字、コンマ、および様々な角括弧の組み合わせではありませんので、問題があると思います。これをインポートする自然な方法はありますか？特に角括弧は問題です。

私は、次のエラーを取得したデータフレームにソースを変換する前に：ここで

ValueError: cannot copy sequence with size 99 to array axis with dimension 1 This is due to the dimensions of my input (100 lines long) and my target which has 10 rows and 1 column.

は切り欠きに読んだ最初のファイルの内容は、（彼らはすべてまったく同じスタイルだ）を使用します入力として： https://pastebin.com/632RBqWc

任意のアイデア： https://pastebin.com/tkysqmVu

そして、ここではfaketargets.datの内容、ターゲットのですか？ヘルプは非常に感謝します。ここでは基本的な混乱がたくさんあると確信しています。

出典

2017-06-15 Edmond Dantès

[docs]（http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html）によると、入力は2Dであると予想されますが、あなたは2Dオブジェクトなので、3Dです。 2D配列（もしそれが意味をなさない）を平坦化するか、またはフィーチャ生成を調べる必要があります。 – ncfirth

@ncfirthああ、ありがとう。このリスト（またはそれがなるデータフレーム）を1D配列に変換する簡単な方法はありますか？または、2D配列を平面化することができます（.flattenを使用しています）。 –

試し書き：

X_train = sources.values[:8] # Inputs 
y_train = targets.values[:8] # Targets

私は、これはあなたの問題を解決することを願っています！

出典

2017-07-20 10:34:15 Blessy

ランダムなフォレストを実行できるようにデータを操作するにはどうすればよいですか？

答えて

関連する問題