文字列内の部分文字列のすべてのインデックスを見つける

Rubyを使用して、大きな文字列内のすべての部分文字列のインデックスを検索できます。例えば：全て「の」「アインシュタイン」文字列内の部分文字列のすべてのインデックスを見つける

str = "Einstein" 
str.index("in") #returns only 1 
str.scan("in") #returns ["in","in"] 
#desired output would be [1, 6]

出典

2017-04-10 Mokhtar

に標準ハックは以下のとおりです。

"Einstein".enum_for(:scan, /(?=in)/).map { Regexp.last_match.offset(0).first } 
#=> [1, 6]

出典

2017-04-10 17:40:02 tokland

ニース、トーク。 '' nnnn ".enum_for（：scan、/nn/).map {Regexp.last_match.offset（0）.first}＃=> [0、2]'に注意してください。 '[0、1、2]'が目的の戻り値であれば、正規表現（ '/ nn /'）を '/（？= nn）/'に変更します。 –

良い点、@Cary。私はほとんどの場合、2番目のものを更新したいと思います。 – tokland

def indices_of_matches(str, target) 
    sz = target.size 
    (0..str.size-sz).select { |i| str[i,sz] == target } 
end 

indices_of_matches('Einstein', 'in') 
    #=> [1, 6] 
indices_of_matches('nnnn', 'nn') 
    #=> [0, 1, 2]

第二の例は、私がオーバーラップした文字列の治療について行わ仮定を反映しています。重複している文字列を考慮しない場合（つまり、2番目の例では[0, 2]が望ましい戻り値です）、この回答は明らかに不適切です。

出典

2017-04-10 20:38:29

シンプルでクリーンな、おそらく私はこれで行くだろう。 – tokland

これはグローバル値に頼らないという利点をもたらし、より詳細なソリューションです：

def indices(string, regex) 
    position = 0 
    Enumerator.new do |yielder| 
    while match = regex.match(string, position) 
     yielder << match.begin(0) 
     position = match.end(0) 
    end 
    end 
end 

p indices("Einstein", /in/).to_a 
# [1, 6]

それはEnumeratorを出力するので、あなたもn最初のインデックスを取る遅延したり、単にそれを使用することができます。

あなただけのインデックスよりも多くの情報が必要な場合があります場合にも、あなたがMatchDataのEnumeratorを返し、指標を抽出できたが：

def matches(string, regex) 
    position = 0 
    Enumerator.new do |yielder| 
    while match = regex.match(string, position) 
     yielder << match 
     position = match.end(0) 
    end 
    end 
end 

p matches("Einstein", /in/).map{ |match| match.begin(0) } 
# [1, 6]

@Caryで説明した動作を得るために、あなたは最後に取って代わる可能性ライン内のブロックposition = match.begin(0) + 1。

出典

2017-04-10 20:46:45

文字列内の部分文字列のすべてのインデックスを見つける

答えて

関連する問題