apache luceneの文字列のリストのパーセンテージの一致方法

Luceneインデックスのドキュメントのリストを持っているとしましょう。このようにキーワード/フレーズを設定しました。apache luceneの文字列のリストのパーセンテージの一致方法

Title: FaceBook 
Content: Associated list of rules, notification, facebook.

ここで、タイトルはフィールドであり、内容はフィールドです。今私の入力は

これがそう、次のメッセージの結果は、ヒットの

一切あってはならないFacebookの

からの通知メッセージである：2

パーセンテージマッチ：100％（構成されたキーワードが完全に一致するため）

は、今私の別の入力は、

設定ミスのみ6文字のみ（6月12日* 100）がマッチングされる12個の文字を有する.ANDメッセージで通知が通知されたので、ここで構成

を通知されます50％になるはずです。

ので、私はこの部分一致のような出力がを発生し、一致率は50％

出典

2017-12-20 Prakhar Nigam

ですたい - Solrのを、弾性＆ルーネン。あなたは実際に何を使っているのですか？これまでに何を試しましたか？ –

今、私はluceneを使用しています。タームクエリの一致率に基づいて一致率を計算しようとしましたが、それはあまり効果がありません。 –

は何とかパーセントの試合のために私の解決策は、あなたはすべての3つのタグ付けされているこの

public class lab1 { 
    public static double similarity(String s1, String s2) { 
    String longer = s1, shorter = s2; 
    if (s1.length() < s2.length()) { // longer should always have greater length 
     longer = s2; shorter = s1; 
    } 
    int longerLength = longer.length(); 
    if (longerLength == 0) { return 1.0; /* both strings are zero length */ } 
    /* // If you have Apache Commons Text 
    // you can use it to calculate the edit distance: 
    LevenshteinDistance levenshteinDistance = new LevenshteinDistance(); 
    return (longerLength - levenshteinDistance.apply(longer, shorter))/(double) longerLength; */ 
    return (longerLength - editDistance(longer, shorter))/(double) longerLength; 

    } 

    public static int editDistance(String s1, String s2) { 
    s1 = s1.toLowerCase(); 
    s2 = s2.toLowerCase(); 

    int[] costs = new int[s2.length() + 1]; 
    for (int i = 0; i <= s1.length(); i++) { 
     int lastValue = i; 
     for (int j = 0; j <= s2.length(); j++) { 
     if (i == 0) 
      costs[j] = j; 
     else { 
      if (j > 0) { 
      int newValue = costs[j - 1]; 
      if (s1.charAt(i - 1) != s2.charAt(j - 1)) 
       newValue = Math.min(Math.min(newValue, lastValue), 
        costs[j]) + 1; 
      costs[j - 1] = lastValue; 
      lastValue = newValue; 
      } 
     } 
     } 
     if (i > 0) 
     costs[s2.length()] = lastValue; 
    } 
    return costs[s2.length()]; 
    } 

    public static void printSimilarity(String s, String t) { 
    System.out.println(String.format(
     "%.3f Percent is the similarity between \"%s\" and \"%s\"", similarity(s, t)*100, s, t)); 
    } 

    public static void main(String[] args) { 
    printSimilarity("", ""); 
    printSimilarity("1234567890", "1"); 
    printSimilarity("1234567890", "123"); 
    printSimilarity("1234567890", "1234567"); 
    printSimilarity("1234567890", "1234567890"); 
    printSimilarity("1234567890", "1234567980"); 
    printSimilarity("47/2010", "472010"); 
    printSimilarity("47/2010", "472011"); 
    printSimilarity("47/2010", "AB.CDEF"); 
    printSimilarity("47/2010", "4B.CDEFG"); 
    printSimilarity("47/2010", "AB.CDEFG"); 
    printSimilarity("The quick fox jumped", "The jumped fox"); 
    printSimilarity("The quick fox jumped", "The fox"); 
    printSimilarity("kitten", "sitting"); 
    } 

}

出典

2018-02-05 17:37:16

apache luceneの文字列のリストのパーセンテージの一致方法

答えて

関連する問題