RubyがUTF-8で無効なバイトシーケンス

私はinitializeのスキャンメソッドを指す無効なバイトシーケンスエラーを表示する次のコードを持っています。どのようにこれを修正するための任意のアイデア？価値のあるものについては、(.*)とh1タグとの間にある>がないとエラーは発生しません。RubyがUTF-8で無効なバイトシーケンス

#!/usr/bin/env ruby 

class NewsParser 

    def initialize 
     Dir.glob("./**/index.htm") do |file| 
     @file = IO.read file 
     parsed = @file.scan(/<h1(.*)>(.*?)<\/h1>(.*)<!-- InstanceEndEditable -->/im) 
     self.write(parsed) 
     end 
    end 

    def write output 
    @contents = output 
    open('output.txt', 'a') do |f| 
     f << @contents[0][0]+"\n\n"[email protected][0][1]+"\n\n\n\n" 
    end 
    end 

end 

p = NewsParser.new

編集：ここではエラーメッセージは次のとおりです。

news_parser.rb:10:in 'scan': invalid byte sequence in UTF-8 (ArgumentError)

解決しよう：使用しての組み合わせ： @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) と encoding: UTF-8 は、問題を解決します。

ありがとうございます！

出典

2012-03-07 redgem

'@file = IO.read（file）.encode（" utf-8 "、replace：nil）' – fl00r

いいえ、同じエラーメッセージが表示されます。 – redgem

ファイルのエンコーディングは？ – fl00r

@file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil)と#encoding: UTF-8の組み合わせで問題が解決されました。

出典

2012-03-08 02:18:12 redgem

'文字列にヌルバイトが含まれています'作業... – Matrix

RubyがUTF-8で無効なバイトシーケンス

答えて

関連する問題