Competitive Programming: POJ 1961 – Period

Description

For each prefix of a given string S with N characters (each character has an ASCII code between 97 and 126, inclusive), we want to know whether the prefix is a periodic string. That is, for each i (2 <= i <= N) we want to know the largest K > 1 (if there is one) such that the prefix of S with length i can be written as A^K, that is A concatenated K times, for some string A. Of course, we also want to know the period K.

Input

The input consists of several test cases. Each test case consists of two lines. The first one contains N (2 <= N <= 1,000,000) – the size of the string S. The second line contains the string S. The input file ends with a line, having the number zero on it.

Output

For each test case, output “Test case #” and the consecutive test case number on a single line; then, for each prefix with length i that has a period K > 1, output the prefix size i and the period K separated by a single space; the prefix sizes must be in increasing order. Print a blank line after each test case.

Sample Input

3
aaa
12
aabaabaabaab
0

Sample Output

Test case #1
2 2
3 3

Test case #2
2 2
6 2
9 3
12 4

Link to problem

Solution below . . .

Solution

The basic string matching problem in computer science is this: given a text $T$ and a pattern $P$ , find all occurrences of $P$ within $T$ .

If $n$ and $m$ are the lengths of $P$ and $T$ , then a naive algorithm — initiating a string comparison at every starting point — takes $O(mn)$ time.

Since this problem allows string lengths up to 1,000,000 characters, we’re going to need a faster, i.e., linear rather than quadratic, algorithm.

We’re trying to find if a given string $S$ can be represented by iterating a substring $K$ times. Let the length of $S$ be $n$ .

Find the length of the longest proper prefix of $S$ that is also a suffix. Let the length of the longest proper prefix-suffix be $len$ . (This can be computed in $O(n)$ time using the pre-processing step of the Knuth-Morris-Pratt (KMP) string matching algorithm.)
If $n\ mod\ (n-len) = 0$ , then return true; else return false.

First we define a proper prefix of a string of length $n$ as a prefix having a length between $0$ and $n-1$ , i.e., a string is not considered a proper prefix of itself. Using the problem example, the proper prefixes of $aabaabaabaab$ are

a
aa
aab
aaba
aabaa
aabaab
aabaaba
aabaabaa
aabaabaab
aabaabaaba
aabaabaabaa

A key element of the KMP algorithm is a pre-processing step that allows the algorithm to skip some comparisons in the subsequent search process. The pre-processing step creates an LSP (longest suffix-prefix) array, which is exactly what we need for this problem.

For example, here’s the LSP array for $aabaabaabaab$ :

Pattern: a a b a a b a a b  a  a  b
LSP    : 0 1 0 1 2 3 4 5 6  7  8  9
i      : 1 2 3 4 5 6 7 8 9 10 11 12

What this means is that for each $i$ , if we consider the pattern ending at $i$ , $LSP[i]$ represents the length of the longest proper prefix of the pattern that is also a suffix.

You should be able to convince yourself from the above table that the proposed algorithm works, i.e., that for $LSP[i] \neq 0$ , if $i\ mod\ (i - LSP[i]) = 0$ , then the pattern ending at $i$ is periodic, and that the period is $i\ /\ (i - LSP[i])$ .

import java.util.Scanner;

public class Main {

  static void computeLPSArray(String str, int M) {

    int len = 0;
    int i;

    int[] lps = new int[M];

    lps[0] = 0; // lps[0] is always 0
    i = 1;

    // calculate the rest of the lps array
    while (i < M) {
      if (str.charAt(i) == str.charAt(len)) {
        len++;
        lps[i] = len;
        
        i++;

        if (len > 0 && i % (i - len) == 0) {
          System.out.println(i + " " + (i / (i - len)));
        }

      } else {
        if (len != 0) {
          len = lps[len - 1];
        } else {
          lps[i] = 0;
          i++;
        }
      }
    }
  }

  public static void main(String[] args) {

    int count = 1;

    Scanner sc = new Scanner(System.in);

    while (sc.hasNext()) {
      int N = sc.nextInt();

      if (N == 0)
        break;

      String S = sc.next();

      System.out.println("Test case #" + count);

      computeLPSArray(S, N);

      System.out.println();

      count++;
    }

    sc.close();
  }
}