NLPのトークン化タスクを処理し、Perl scriptからPython scriptにスクリプトを移植するのが目的です。PythonとPerlの正規表現のバックスラッシュとエスケープ文字
>>> import re
>>> from six import text_type
>>> sent = text_type("this ain't funny")
>>> escape_singquote = r"\'", r"\'" # escape the left quote for XML
>>> contraction = r"n't", r" n't" # pad a space on the left when "n't" pattern is seen
>>> text = sent
>>> for regexp, substitution in [contraction, escape_singquote]:
... text = re.sub(regexp, substitution, text)
... print text
this ai n't funny
this ai n\'t funny
に正規表現を移植my($text) = @_; # Reading a text from stdin
$text =~ s=n't = n't =g; # Puts a space before the "n't" substring to tokenize english contractions like "don't" -> "do n't".
$text =~ s/\'/\'/g; # Escape the single quote so that it suits XML.
>>> escape_singquote = r"\'", r"'" # escape the left quote for XML
>>> text = sent
>>> for regexp, substitution in [contraction, escape_singquote]:
... text = re.sub(regexp, substitution, text)
... print text
this ai n't funny
this ai n't funny
>>> import re
>>> from six import text_type
>>> sent = text_type("this ain't funny")
>>> escape_singquote = r"\'", r"\'" # escape the left quote for XML
>>> contraction = r"n't", r" n't" # pad a space on the left when "n't" pattern is seen
>>> escape_singquote = r"'", r"'" # escape the left quote for XML
>>> text = sent
>>> for regexp, substitution in [contraction, escape_singquote]:
... text = re.sub(regexp, substitution, text)
... print text
this ai n't funny
this ai n't funny
不可解ですので、質問がためです文字はPythonでエスケープする必要があり、Perlではどの文字を使用しますか? PerlとPythonの正規表現はそれと同等の権利はありませんか? PerlやPythonの両方で
すべての生の文字列を使用しています。バックスラッシュはリテラルです。 – TigerhawkT3
これを確認してください: – MYGz
Perlバージョンでもバックスラッシュは必要ありません。 – Borodin