パンダの部分文字列を効率的に置き換える方法は？

目的：は私に提供されたものに基づいてパンダのデータフレームの内容を再フォーマットします。パンダの部分文字列を効率的に置き換える方法は？

I以下のデータフレームがあります。私は次のスタイルで各列を変更するために探しています

：

私は私が必要なスタイルを生成するために、次のコードを使用していますが、それは効率的ではありません：

lt = [] 
for i in patterns['Components'][0]: 
    for x in i.split('__'): 
     lt.append(x) 
lt[1].replace('(','').replace(', ',' < '+str(lt[0])+' ≤ ').replace(']','')

私はを試みましたは役に立たなくなっています。エラーは発生せず、私がやろうとしていることを無視しているようです。

出典

2017-09-10 Student

は、文字列型のすべての列はありますか？あなたが 'type（df.Components.iloc [0]）'のときに何を得ますか？ – Psidom

null以外のオブジェクト – Student

出典DF：

In [37]: df 
Out[37]: 
          Components        Outcome 
0   (Quantity__(0.0, 16199.0]) (UnitPrice__(-1055.648, 3947.558]) 
1 (UnitPrice__(-1055.648, 3947.558])   (Quantity__(0.0, 16199.0])

ソリューション：

In [38]: cols = ['Components','Outcome'] 
    ...: df[cols] = df[cols].replace(r'\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\).*', 
    ...:        r'\2 < \1 <= \3', 
    ...:        regex=True)

結果：

In [39]: df 
Out[39]: 
          Components       Outcome 
0   0.0 < Quantity <= 16199.0 -1055.648 < UnitPrice <= 3947.558 
1 -1055.648 < UnitPrice <= 3947.558   0.0 < Quantity <= 16199.0

UPDATE：

In [113]: df 
Out[113]: 
           Components        Outcome 
0    (Quantity__(0.0, 16199.0])  (UnitPrice__(-1055.648, 3947.558]) 
1 (UnitPrice__(-1055.648, 3947.558])    (Quantity__(0.0, 16199.0]) 

In [114]: cols = ['Components','Outcome'] 

In [115]: pat = r'\s*\(([^_]*)__\(([^,\s]+),\s*([^\]]+)\]\)\s*' 

In [116]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=True) 

In [117]: df 
Out[117]: 
          Components       Outcome 
0   0.0 < Quantity <= 16199.0 -1055.648 < UnitPrice <= 3947.558 
1 -1055.648 < UnitPrice <= 3947.558   0.0 < Quantity <= 16199.0

または括弧witout：

In [119]: df 
Out[119]: 
         Components       Outcome 
0   Quantity__(0.0, 16199.0]) UnitPrice__(-1055.648, 3947.558] 
1 UnitPrice__(-1055.648, 3947.558]   Quantity__(0.0, 16199.0] 

In [120]: pat = r'([^_]*)__\(([^,\s]+),\s*([^\]]+)\]' 

In [121]: df[cols] = df[cols].replace(pat, r'\2 < \1 <= \3', regex=True) 

In [122]: df 
Out[122]: 
          Components       Outcome 
0   0.0 < Quantity <= 16199.0) -1055.648 < UnitPrice <= 3947.558 
1 -1055.648 < UnitPrice <= 3947.558   0.0 < Quantity <= 16199.0

出典

2017-09-10 16:10:30 MaxU

あなたのソリューションはすばらしく見えますが、データフレームの元の結果（新しいものはありません）だけが返されます。それが重要な場合、パンダのデータフレーム（['Components'、 'Outcome']）の元の結果は両方とも非nullオブジェクトです。 – Student

@Student、つまり、実際のデータ（文字列）がわずかに異なり、サンプルデータセットが再現性がないことを意味します。これは、RegExがサンプルDFで動作しているため、あなたは__reproducible__サンプルデータセットを提供しています（___ text__フォーマットで、コピーして貼り付けることができます）。 – MaxU

（frozenset（{'Quantity __（0.0、16199.0）'}）、 'パターン[' Components '] [0]、df ['コンポーネント '] [0] （Quantity __（0.0、16199.0）） '）。これが役に立つかどうかはわかりませんが、元のデータフレーム（パターン）からの出力はすべてです。 2つのデータフレームが同じでない可能性があるパターン= df = False）に基づいていないので、patterns.replace（ '（^ \ s + | \ s + $）'、 ''、regex = True、inplace = True）でクリーンアップしようとしました。これは出力に違いがありませんでした。アイデア？ – Student

import pandas as pd 
import re 
data=pd.DataFrame({'components': 
['(quantity__(0.0,16199.0])','(unitprice__(-1055.648,8494.557])'],'outcome': 
['(unitprice__(-1055.648,8494.557])','quantity__(0.0,16199.0])']}) 


def func(x): 
    x=str(x) 
    x=x.split('__') 
    dx=x[0].replace("(",'') 
    mt=re.findall('\d*\.\d*',x[1]) 
    return('{}<{}<={}'.format(dx,mt[0],mt[1])) 


df=data.applymap(func) 
print(df)

出典

2017-09-11 17:54:46 ajay

パンダの部分文字列を効率的に置き換える方法は？

答えて

関連する問題