Pythonで%%で囲まれた文字列を小文字に変換します

フィールドの1つに%% .. %%で囲まれたpysparkデータフレームがあります。同封された内容は大文字と小文字の区別がありません。私は小文字に変換したい。Pythonで%%で囲まれた文字列を小文字に変換します

以下は、データフレームのスナップショットです。

https://www.xxxxxxxx.co.nz/Activities|http://www.xxxxxxxx.co.nz/things-to-do/search?location=%%t.trip_intrip_1_dest_city_1%% 

https://images.trvl-media.com/media/content/expus/email/2016/us/banner/images/image_stor-34461_09_600x250.jpg|%%mis_lx_offers_mod_images.largeimageurl%%

のみの文字列に変換される%%で囲ま：

は、列のテキストは、私は次のフォーマットに上記のテキストを変換するこの

https://www.xxxxxxxx.co.nz/Activities|http://www.xxxxxxxx.co.nz/things-to-do/search?location=%%t.Trip_Intrip_1_dest_City_1%% 

https://images.trvl-media.com/media/content/expus/email/2016/us/banner/images/image_stor-34461_09_600x250.jpg|%%mis_lx_Offers_mod_Images.LargeImageURL%%

のように見えます

出典

2017-11-29 Yuvaraj

テキスト 'LargeImageUrl'はまだあなたの質問 – theBrainyGeek

は、あなたがそれをマッピングし、' .split（ "%%"） 'そして' .lowerをしないことはできませんのすべての下のキャップではありません（） '？ – 16num

@theBrainyGeekSorry、それはタイプミスでした。変更を加えました。ありがとう。 – Yuvaraj

文字列はPythonでは不変なので、新しい値を再割り当てする必要があります。したがって、私は文字列全体を反復するほうがよいと思う（コメントではsplitを避けたいと言っているから）。私はあなたが単純な正規表現を使用することができ、この

new='' 
f=0 
for i in textstr: 
    if i == '%': 
     f += 1 
    if (f/2)%2 == 1: 
     new+=i.lower() 
    else: 
     new+=i

または正規表現

出典

2017-11-29 19:35:49 theBrainyGeek

に行くようなものを考えていた：

交換するすべてのシーケンスは、そのと各シーケンスを交換して下さい

を小文字の同等物

import re 

link1 = 'https://images.trvl-media.com/media/content/expus/email/2016/us/banner/images/image_stor-34461_09_600x250.jpg|%%mis_lx_Offers_mod_Images.LargeImageURL%%' 
link2 = 'https://www.xxxxxxxx.co.nz/Activities|http://www.xxxxxxxx.co.nz/things-to-do/search?location=%%t.Trip_Intrip_1_dest_City_1%%' 
links = [link1, link2] 

for idx, link in enumerate(links): 
    lowers = re.findall(r'%%.*?%%', link) 
    for x in lowers: 
     links[idx] = re.sub(r'%%.*?%%', x.lower(), link) 

for link in links: 
    print(link)

出力：正規表現を使用して

https://images.trvl-media.com/media/content/expus/email/2016/us/banner/images/image_stor-34461_09_600x250.jpg|%%mis_lx_offers_mod_images.largeimageurl%% 
https://www.xxxxxxxx.co.nz/Activities|http://www.xxxxxxxx.co.nz/things-to-do/search?location=%%t.trip_intrip_1_dest_city_1%%

出典

2017-11-29 19:44:16 mentalita

ありがとう。私はあなたのアプローチとそれが1 %%の囲まれた文字列しかなければ働いていました。しかし、%% mis_lx_Offers_mod_Images.LargeImageURL %% | https：//www.xxxxxxxx.co.nz/Activities | http：//www.xxxxxxxx.co.nz/things-to-do/searchのような2つがある場合は、 location = %% t.Trip_Intrip_1_dest_City_1 %% 'ループの最後の値を繰り返します。出力は次のようになります。 '| %% t.trip_intrip_1_dest_city_1 %% | https：//www.xxxxxxxx.co.nz/Activities | http：//www.xxxxxxxx.co.nz/things-to-do/search？location =％％t.trip_intrip_1_dest_city_1 %% ' – Yuvaraj

は@mentalitaによって示唆

input_df：

>>> df.show(truncate=False) 
+----+---------------------------------+ 
|col1|col2        | 
+----+---------------------------------+ 
|1 |http://%%FOO%%|some_string%%BAR%%| 
|2 |http://%%FOO%%|some_string  | 
+----+---------------------------------+

コード：

def convert_to_lower(link): 
    target_strings = re.findall(r'%%.*?%%', link) 
    for x in target_strings: 
      link = re.sub(x, x.lower(), link) 
    return link 

convert_to_lower_udf = F.udf(lambda x: convert_to_lower(x)) 
df = df\ 
    .withColumn('converted_strings', convert_to_lower_udf('col2'))

output_df：

>>> df.show(truncate=False) 
+----+---------------------------------+---------------------------------+ 
|col1|col2        |converted_strings    | 
+----+---------------------------------+---------------------------------+ 
|1 |http://%%FOO%%|some_string%%BAR%%|http://%%foo%%|some_string%%bar%%| 
|2 |http://%%FOO%%|some_string  |http://%%foo%%|some_string  | 
+----+---------------------------------+---------------------------------+

出典

2017-11-30 08:36:41

Pythonで%%で囲まれた文字列を小文字に変換します

答えて

関連する問題