2010-11-21 4 views



non-standard format produced by escape in JavaScriptのように見えます。このデータを送信しているコードに影響を与えることができる場合は、おそらくencodeURI(代わりにUTF-8エンコードされた文字の「通常の」パーセントエンコーディングが使用されます)を使用するようにしてください。

# Unescape percent encoding. 
# The normal byte-oriented format ("%41") and the non-standard <em>%u</em> 
# format ("%u0410") are both supported. The single-byte variant is decoded 
# as if it represents bytes encoded with the same encoding as +str+. The 
# two-byte <em>%u</em> variant is decoded as UTF-16BE and then re-encoded 
# with the same encoding as +str+; surrogate pairs are supported. 
# Since the resulting string will have the same encoding as +str+, all byte 
# sequences resulting from the byte-oriented decoding must be valid sequences 
# in the the encoding of +str+. Correspondingly, the encoding of +str+ must 
# be compatible with any extended characters that are decoded from the 
# UTF-16BE <em>%u</em> encodings. 

def unescape(str) 
    hh = /[0-9a-f]{2}/i 
    hhhh = /[0-9a-f]{4}/i 
    str.gsub(/((?:%#{hh})+)|((?:%u#{hhhh})+)/) do 
    if $1 
    elsif $2 
     raise 'unhandled match' 

def all_same?(e) 
    first = e.first 
    e.drop(1).all? { |o| o.eql?(first) } 

ss = [ 
    # %-encoded-UTF-16BE -> SJIS (just for something fun... UTF-8 works fine) 
    # %-encoded-ISO-8859-5 -> ISO-8859-5 
    # %-encoded-UTF-8 -> UTF-8 

ss2 = [ # demonstrate non-decoded content and UTF-16BE surrogate pair decoding 
    # %-encoded-UTF-16BE -> UTF-8 
    # %-encoded-UTF-8 -> UTF-8 

ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } } 
all_same? ss.map { |s| s.encode(Encoding::UTF_8) } 

ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } } 
all_same? ss2.map { |s| s.encode(Encoding::UTF_8) } 


ruby-1.9.2-head > ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } } 
[#<Encoding:Shift_JIS>, #<Encoding:ISO-8859-5>, #<Encoding:UTF-8>] 
=> ["\x{844E}\x{8470}\x{8474}\x{8480}\x{8475}\x{847C}", "\xBD\xD0\xD4\xDE\xD5\xDB", "Надоел"] 
ruby-1.9.2-head > all_same? ss.map { |s| s.encode(Encoding::UTF_8) } 
=> true 
ruby-1.9.2-head > 
ruby-1.9.2-head > ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } } 
[#<Encoding:UTF-8>, #<Encoding:UTF-8>] 
=> ["AА", "AА"] 
ruby-1.9.2-head > all_same? ss2.map { |s| s.encode(Encoding::UTF_8) } 
=> true 

は、そんなにクリスありがとうございます!私はあなたの助けに感謝します! – Daniel
