2017-10-19 6 views
2

私は試してみるが、これは非常に難しいコーディングの問題をオンラインで見つけた。KMP文字列検索のアルゴリズムですか?

与えられた文字列TとパターンPは、このパターンの出現を見つけ、対応する値を合計して最大値と最小値を返します。問題をさらに詳しく知りたい場合は、thisを参照してください。

ただし、以下のコードは簡単なテストケースで動作しますが、複数の複雑なテストケースで実行すると非常に遅くなり、コードを最適化する必要があるかどうかわかりません。

ロジックを間違っている人は誰でも助けてください。

public class DeterminingDNAHealth { 


    private DeterminingDNAHealth() { 
    /* 
    * Fixme: 
    * Each DNA contains number of genes 
    * - some of them are beneficial and increase DNA's total health 
    * - Each Gene has a health value 
    * ====== 
    * - Total health of DNA = sum of all health values of beneficial genes 
    */ 
    } 

    int checking(int start, int end, String pattern) { 
    String[] genesChar = new String[] { 
     "a", 
     "b", 
     "c", 
     "aa", 
     "d", 
     "b" 
    }; 
    String numbers = "123456"; 

    int total = 0; 

    for (int i = start; i <= end; i++) { 
     total += KMPAlgorithm.initiateAlgorithm(pattern, genesChar[i]) * (i + 1); 
    } 

    return total; 
    } 

    public static void main(String[] args) { 

    String[] genesChar = new String[] { 
     "a", 
     "b", 
     "c", 
     "aa", 
     "d", 
     "b" 
    }; 
    Gene[] genes = new Gene[genesChar.length]; 

    for (int i = 0; i < 6; i++) { 
     genes[i] = new Gene(genesChar[i], i + 1); 
    } 

    String[] checking = "15caaab 04xyz 24bcdybc".split(" "); 


    DeterminingDNAHealth DNA = new DeterminingDNAHealth(); 
    int i, mostHealthiest, mostUnhealthiest; 

    mostHealthiest = Integer.MIN_VALUE; 
    mostUnhealthiest = Integer.MAX_VALUE; 

    for (i = 0; i < checking.length; i++) { 
     int start = Character.getNumericValue(checking[i].charAt(0)); 
     int end = Character.getNumericValue(checking[i].charAt(1)); 
     String pattern = checking[i].substring(2, checking[i].length()); 

     int check = DNA.checking(start, end, pattern); 

     if (check > mostHealthiest) 
     mostHealthiest = check; 
     else 
     if (check < mostUnhealthiest) 
     mostUnhealthiest = check; 
    } 

    System.out.println(mostHealthiest + " " + mostUnhealthiest); 

    // DNA.checking(1,5, "caaab"); 
    } 
} 

KMPAlgorithm

public class KMPAlgorithm { 

    KMPAlgorithm() {} 


    public static int initiateAlgorithm(String text, String pattern) { 

    // let us generate our LPC table from the pattern 
    int[] partialMatchTable = partialMatchTable(pattern); 

    int matchedOccurrences = 0; 

    // initially we don't have anything matched, so 0 
    int partialMatchLength = 0; 

    // we then start to loop through the text, !note, not the pattern. The text that we are testing the pattern on 
    for (int i = 0; i < text.length(); i++) { 

     // if there is a mismatch and there's no previous match, then we've hit the base-case, hence break from while{...} 
     while (partialMatchLength > 0 && text.charAt(i) != pattern.charAt(partialMatchLength)) { 

     /* 
     * otherwise, based on the number of chars matched, we decrement it by 1. 
     * In fact, this is the unique part of this algorithm. It is this part that we plan to skip partialMatchLength 
     * iterations. So if our partialMatchLength was 5, then we are going to skip (5 - 1) iteration. 
     */ 
     partialMatchLength = partialMatchTable[partialMatchLength - 1]; 

     } 


     // if however we have a char that matches the current text[i] 
     if (text.charAt(i) == pattern.charAt(partialMatchLength)) { 

     // then increment position, so hence we check the next char of the pattern against the next char in text 
     partialMatchLength++; 

     // we will know that we're at the end of the pattern matching, if the matched length is same as the pattern length 
     if (partialMatchLength == pattern.length()) { 
      // to get the starting index of the matched pattern in text, apply this formula (i - (partialMatchLength - 1)) 

      // this line increments when a match string occurs multiple times; 
      matchedOccurrences++; 

      // just before when we have a full matched pattern, we want to test for multiple occurrences, so we make 
      // our match length incomplete, and let it run longer. 
      partialMatchLength = partialMatchTable[partialMatchLength - 1]; 

     } 
     } 

    } 

    return matchedOccurrences; 


    } 


    private static int[] partialMatchTable(String pattern) { 
    /* 
    * TODO 
    * Note: 
    * => Proper prefix: All the characters in a string, with one or more cut off the end. 
    * => proper suffix: All the characters in a string, with one or more cut off the beginning. 
    * 
    * 1.) Take the pattern and construct a partial match table 
    * 
    * To construct partial match table { 
    *  1. Loop through the String(pattern) 
    *  2. Create a table of size String(pattern).length 
    *  3. For each character c[i], get The length of the longest proper prefix in the (sub)pattern 
    *   that matches a proper suffix in the same (sub)pattern 
    * } 
    */ 

    // we will need two incremental variables 
    int i, j; 

    // an LSP table also known as “longest suffix-prefix” 
    int[] LSP = new int[pattern.length()]; 


    // our initial case is that the first element is set to 0 
    LSP[0] = 0; 

    // loop through the pattern... 
    for (i = 1; i < pattern.length(); i++) { 

     // set our j as previous elements data (not the index) 
     j = LSP[i - 1]; 


     // we will be comparing previous and current elements data. ei char 
     char current = pattern.charAt(i), previous = pattern.charAt(j); 

     // we will have a case when we're somewhere in loop and two chars will not match, and j is not in base case. 
     while (j > 0 && current != previous) 
     // we decrement our j 
     j = LSP[j - 1]; 

     // simply put, if two characters are same, then we update our LSP to say that at that point, we hold the j's value 
     if (current == previous) 
     // increment our j 
     j++; 

     // update the table 
     LSP[i] = j; 


    } 

    return LSP; 

    } 
} 

讲义1コースコードクレジットGithub

+1

'詳細を読むには、こちらを参照してください。リンクはどこですか? –

+0

申し訳ありませんが、私は質問を更新しました。リンクが利用可能になるはずです。 @PhamTrung – OOP

+0

あなたは編集を見ましたか?あなたの時間の複雑さはO(n * m)であり、nはクエリの量であり、mはパターンの量であり、nは10^5であり、mは10^5であり、明らかに時間制限に適合しない。 –

答えて

0

にあなたは、このKMPの実装を試みることがあります。 KMPが意図されているので、O(m + n)です。それはもっと速くなければなりません:

private static int[] failureFunction(char[] pattern) { 
    int m = pattern.length; 

    int[] f = new int[pattern.length]; 
    f[0] = 0; 

    int i = 1; 
    int j = 0; 

    while (i < m) { 
     if (pattern[i] == pattern[j]) { 
      f[i] = j + 1; 
      i++; 
      j++; 
     } else if (j > 0) { 
      j = f[j - 1]; 
     } else { 
      f[i] = 0; 
      i++; 
     } 
    } 
    return f; 
} 

private static int kmpMatch(char[] text, char[] pattern) { 
    int[] f = failureFunction(pattern); 

    int m = pattern.length; 
    int n = text.length; 

    int i = 0; 
    int j = 0; 

    while (i < n) { 
     if (pattern[j] == text[i]) { 
      if (j == m - 1){ 
       return i - (m - 1); 
      } else { 
       i++; 
       j++; 
      } 
     } else if (j > 0) { 
      j = f[j - 1]; 
     } else { 
      i++; 
     } 
    } 
    return -1; 
}