私はstr.extract
を使用します。
注意:あなたが行のN個の項目の最大を持っているよ知っていればあなたはrange(1, N)
とrange(1, 2)
を置き換えることができます。
In [11]: s
Out[11]:
0 PART A TO PART B - 2 features out of tolerance...
1 PART C TO PART B - 1 feature out of tolerance:...
2 PART Z-X TO PART C - 1 feature out of toleranc...
dtype: object
In [12]: def chunk(i):
...: return r'(?P<junk_{}>\s(?P<number_{}>.*?)\(dev=(?P<size_{}>-?[\.0-9]+) mm\))'.format(i, i, i)
...:
In [13]: df = s.str.extract("(?P<part>.*?)\s-.*?:{}?.*?".format(chunk(0) + "?".join((chunk(i) for i in range(1, 2)))), expand=True)
In [14]: df
Out[14]:
part junk_0 number_0 size_0 junk_1 number_1 size_1
0 PART A TO PART B A12C(dev=-3.7 mm) A12C -3.7 A14D(dev=-4.1 mm) A14D -4.1
1 PART C TO PART B A14C(dev=-1.8 mm) A14C -1.8 NaN NaN NaN
2 PART Z-X TO PART C A25C(dev=-6.2 mm) A25C -6.2 NaN NaN NaN
In [15]: df = s.str.extract("(?P<part_0>.*?)\s-.*?:{}?.*?".format(chunk(0) + "?".join((chunk(i) for i in range(1, 2)))), expand=True)
In [16]: df
Out[16]:
part_0 junk_0 number_0 size_0 junk_1 number_1 size_1
0 PART A TO PART B A12C(dev=-3.7 mm) A12C -3.7 A14D(dev=-4.1 mm) A14D -4.1
1 PART C TO PART B A14C(dev=-1.8 mm) A14C -1.8 NaN NaN NaN
2 PART Z-X TO PART C A25C(dev=-6.2 mm) A25C -6.2 NaN NaN NaN
In [17]: df.columns = pd.MultiIndex.from_tuples(df.columns.map(lambda x: tuple(x.split("_"))))
In [18]: df
Out[18]:
part junk number size junk number size
0 0 0 0 1 1 1
0 PART A TO PART B A12C(dev=-3.7 mm) A12C -3.7 A14D(dev=-4.1 mm) A14D -4.1
1 PART C TO PART B A14C(dev=-1.8 mm) A14C -1.8 NaN NaN NaN
2 PART Z-X TO PART C A25C(dev=-6.2 mm) A25C -6.2 NaN NaN NaN
In [19]: df1 = df.stack(level=1)
In [20]: df1
Out[20]:
junk number part size
0 0 A12C(dev=-3.7 mm) A12C PART A TO PART B -3.7
1 A14D(dev=-4.1 mm) A14D NaN -4.1
1 0 A14C(dev=-1.8 mm) A14C PART C TO PART B -1.8
2 0 A25C(dev=-6.2 mm) A25C PART Z-X TO PART C -6.2
最後に、我々は0よりも大きいそれらのための部分に記入:
In [20]: df1.part = df1.part.ffill()
In [21]: df1
Out[21]:
junk number part size
0 0 A12C(dev=-3.7 mm) A12C PART A TO PART B -3.7
1 A14D(dev=-4.1 mm) A14D PART A TO PART B -4.1
1 0 A14C(dev=-1.8 mm) A14C PART C TO PART B -1.8
2 0 A25C(dev=-6.2 mm) A25C PART Z-X TO PART C -6.2
起こって多くがここでちょっとあります、重要なのは、あなたがする必要がある場合には(あなたの正規表現をブラッシュアップすることですこれでやってください)。
おそらく、あなたは "迷惑メール"の列を無効にしたいと思うでしょう!
はい、できます。何を試しましたか? –