パンダ：最初の列から各列を分割

目標は、最初の列パンダ：最初の列から各列を分割

column1, column2 
Hello World, #HelloWord 
US Election, #USElection

から第二列を作成しながら、私は私は1つの列を持つ単純なファイル

columnOne 
Hello World 
US Election 
Movie Night

を有する別の列を作成します次の関数を書きました

>>> def newColumn(row): 
...  r = "#" + "".join(row.split(" ")) 
...  return r

次に、私は次のようにpandasを使って2番目の列を作成しました

df['column2'] = df.apply (lambda row: newColumn(row),axis=1)

しかし、私はエラーを次で終わる：

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 3972, in apply 
    return self._apply_standard(f, axis, reduce=reduce) 
    File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 4064, in _apply_standard 
    results[i] = func(v) 
    File "<stdin>", line 1, in <lambda> 
    File "<stdin>", line 2, in newColumn 
    File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 2360, in __getattr__ 
    (type(self).__name__, name)) 
AttributeError: ("'Series' object has no attribute 'split'", u'occurred at index 0')

ので、私は次のように分割を変更します。

r = "".join(row.str.split(" "))

しかし、それはこれが何をすべき

出典

2016-10-18 Null-Hypothesis

あなたは次のとおりですか？ 'df ['column2'] = '＃' + df.columnOne.str.replace（ '\ s +'、 ''）'？ – MaxU

は、リストcomprehesionをお試しください：

df = pandas.DataFrame({'columnOne': ['Hello World', 'US Election', 'Movie Night']}) 

df['column2'] = ['#' + item.replace(' ', '') for item in df.columnOne] 

In [2]: df

出典

2016-10-18 16:45:25 estebanpdl

を助けにはなりませんでしたトリック

df['new_column'] = df['old_column'].apply(lambda x: "#"+x.replace(' ', ''))

例

>>> names = ['Hello World', 'US Election', 'Movie Night'] 
>>> df = pd.DataFrame(data = names, columns=['Names']) 
>>> df 
    Names 
0 Hello World 
1 US Election 
2 Movie Night 

>>> df['Names2'] = df['Names'].apply(lambda x: "#"+x.replace(' ', '')) 
>>> df 
    Names   Names2 
0 Hello World #HelloWorld 
1 US Election #USElection 
2 Movie Night #MovieNight

出典

2016-10-18 16:45:02 mk2

あなたの一般的なアプローチは完全に大丈夫です、あなただけのいくつかの問題を抱えています。データフレーム全体に対して適用を使用すると、適用対象の関数に行または列のいずれかが渡されます。あなたの場合、行または列は必要ありません。最初の列の各セル内にある文字列が必要です。したがって、df.applyを実行する代わりに、df['columnOne'].applyが必要です。ここで

は、私がどうなるのかです：

import pandas as pd 

df = pd.DataFrame(['First test here', 'Second test'], columns=['A']) 

# Note that this function expects a string, and returns a string 
def new_string(s): 
    # Get rid of the spaces 
    s = s.replace(' ','') 
    # Add the hash 
    s = '#' + s 
    return s 

# The, apply it to the first column, and save it in the second, new column 
df['B'] = df['A'].apply(new_string)

それとも、あなたは本当にワンライナーでそれをしたい場合：

df['B'] = df['A'].apply(lambda x: '#' + x.replace(' ',''))

出典

2016-10-18 17:04:34 Jeremy

のパラメータregex=TrueとMaxUまたはSeries.replaceをコメントとしてあなたがstr.replace使用することができます空白の文字列ですべての空白を置き換えます。

df['column2'] = '#' + df.column1.str.replace('\s+','') 
df['column3'] = '#' + df.column1.replace('\s+','', regex=True) 

print (df) 
     column1  column2  column3 
0 Hello World #HelloWorld #HelloWorld 
1 US Election #USElection #USElection

出典

2016-10-18 17:32:39 jezrael

パンダ：最初の列から各列を分割

答えて

関連する問題