In combinatorialmathematics, a De Bruijn sequence of order n on a size-kalphabetA is a cyclic sequence in which every possible length-nstring on A occurs exactly once as a substring (i.e., as a contiguoussubsequence). Such a sequence is denoted by B(k, n) and has length kn, which is also the number of distinct substrings of length n on A; de Bruijn sequences are therefore optimally short. There are distinct De Bruijn sequences B(k, n).
Taking A = {0, 1}, there are two distinct B(2, 3): 00010111 and 11101000, one being the reverse or negation of the other.
Two of the 2048 possible B(2, 5) in the same alphabet are 00000100011001010011101011011111 and 00000101001000111110111001101011.
Well, it is a sequence (again, in this case binary) which contains all combinations/permutations of a specific length. And it does this only once.
For example: B(2,4)
000100110101111 (000)
This sequence contains all possible permutations you can make in binary of length 4. The last part (000) is optional, and needed it you do not want the sequence looped. As you can see, the length of this sequence is 16. All sequences will have length 2^N.
How do you construct these sequences?
The next thing I wanted to know is how to construct these sequences. The first hit I got was from the MathWorld website. It states: Surprisingly, it turns out that the lexicographic sequence of Lyndon words of lengths divisible by N gives the lexicographically smallest de Bruijn sequence (Ruskey).
Lyndon word..?
It seems that the next step is generating Lyndon words. Lyndon words are the lexographically smallest rotation of a word. I haven’t yet found a proper way to do this (I know there are) so here is what I do:
Pseudo:
lastLyndon = -1
for all possible N's {
currentSmallestLyndon = smallestLyndonRotation(N)
if( currentSmallestLyndon > lastLyndon ) {
lastLyndon = currentSmallestLyndon
print currentSmallestLyndon
}
}
And here is what I used to generate the smallestLyndonRotation (Java):
How does it work? Well, for example the input is: 0010.
It loops over all rotations (using some bit-masking):
0010
0100
1000
0001
And finds the smallest lexographically (0001) and returns this.
The above pseudo code will spit out the following numbers:
0000
0001
0011
0101
0111
1111
These are all the Lyndon words for N=4.
Splitting, reducing the Lyndon words
Now if we put the Lyndon words together we get:
000000010011010101111111 (our sequence)
0000100110101111 (de Bruijn sequence)
We aren’t quite there yet… but there is some resemblance. There is one more step left, we have to reduce/split the Lyndon words into its smallest unique part.
Lets take the Lyndon words again:
0000 -> 0 (4x)
0001 -> 0001
0011 -> 0011
0101 -> 01 (2x)
0111 -> 0111
1111 -> 1 (4x)
Results in: 0000100110101111, the sequence!
And that is it! If we’ve created the smallest possible de Bruijn sequence for B(2,4).
It has an algorithm which generates Lyndon words in CAT. CAT stands for Constant Amortized Time. This means that it isn’t really constant time, but approaches it very much. Calculating the worst-case scenario isn’t worth it. For example using an ArrayList in Java. The time to do insertions is said to be O(1), and this is true…. until the array underneath needs to be resized. During this resize it’ll take some more time. So the algorithm isn’t really constant time, but constant amortized time.
This is the program I ended up with. It is very very small. One of my most beautiful pieces of code. And it even supports non-binary alphabets (use the k-parameter).