https://en.wikipedia.org/wiki/De_Bruijn_graph
https://en.wikipedia.org/wiki/De_Bruijn_sequence
https://en.wikipedia.org/wiki/Lyndon_word
a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations.
http://introcs.cs.princeton.edu/java/31datatype/DeBruijn.java.html
https://en.wikipedia.org/wiki/De_Bruijn_sequence
In combinatorial mathematics, a De Bruijn sequence of order n on a size-k alphabet A is a cyclic sequence in which every possible length-n string on A occurs exactly once as a substring (i.e., as a contiguous subsequence). Such a sequence is denoted by B(k, n) and has length kn, which is also the number of distinct substrings of length n on A; de Bruijn sequences are therefore optimally short. There are distinct De Bruijn sequences B(k, n).
- Taking A = {0, 1}, there are two distinct B(2, 3): 00010111 and 11101000, one being the reverse or negation of the other.
- Two of the 2048 possible B(2, 5) in the same alphabet are 00000100011001010011101011011111 and 00000101001000111110111001101011.
What is a de Bruijn sequence?
Well, it is a sequence (again, in this case binary) which contains all combinations/permutations of a specific length. And it does this only once.
For example: B(2,4)
For example: B(2,4)
- 000100110101111 (000)
This sequence contains all possible permutations you can make in binary of length 4. The last part (000) is optional, and needed it you do not want the sequence looped. As you can see, the length of this sequence is 16. All sequences will have length 2^N.
How do you construct these sequences?
The next thing I wanted to know is how to construct these sequences. The first hit I got was from the MathWorld website. It states:
Surprisingly, it turns out that the lexicographic sequence of Lyndon words of lengths divisible by N gives the lexicographically smallest de Bruijn sequence (Ruskey).
Surprisingly, it turns out that the lexicographic sequence of Lyndon words of lengths divisible by N gives the lexicographically smallest de Bruijn sequence (Ruskey).
Lyndon word..?
It seems that the next step is generating Lyndon words. Lyndon words are the lexographically smallest rotation of a word. I haven’t yet found a proper way to do this (I know there are) so here is what I do:
Pseudo:
And here is what I used to generate the smallestLyndonRotation (Java):
How does it work? Well, for example the input is: 0010.
It loops over all rotations (using some bit-masking):
It loops over all rotations (using some bit-masking):
- 0010
- 0100
- 1000
- 0001
And finds the smallest lexographically (0001) and returns this.
The above pseudo code will spit out the following numbers:
- 0000
- 0001
- 0011
- 0101
- 0111
- 1111
These are all the Lyndon words for N=4.
Splitting, reducing the Lyndon words
Now if we put the Lyndon words together we get:
- 000000010011010101111111 (our sequence)
- 0000100110101111 (de Bruijn sequence)
We aren’t quite there yet… but there is some resemblance. There is one more step left, we have to reduce/split the Lyndon words into its smallest unique part.
Lets take the Lyndon words again:
- 0000 -> 0 (4x)
- 0001 -> 0001
- 0011 -> 0011
- 0101 -> 01 (2x)
- 0111 -> 0111
- 1111 -> 1 (4x)
Results in: 0000100110101111, the sequence!
And that is it! If we’ve created the smallest possible de Bruijn sequence for B(2,4).
Here are some more sequences:
- 01
- 0011
- 00010111
- 0000100110101111
- 00000100011001010011101011011111
- 0000001000011000101000111001001011001101001111010101110110111111
- 00000001000001100001010000111000100100010110001101000111100100110010101001011100110110011101001111101010110101111011011101111111
- 00000000100000011000001010000011100001001000010110000110100001111000100010011000101010001011100011001000110110001110100011111001
00101001001110010101100101101001011110011001101010011011100111011001111010011111101010101110101101101011111011011110111011111111 - 00000000010000000110000001010000001110000010010000010110000011010000011110000100010000100110000101010000101110000110010000110110
00011101000011111000100011000100101000100111000101001000101011000101101000101111000110011000110101000110111000111001000111011000111
10100011111100100100101100100110100100111100101001100101010100101011100101101100101110100101111100110011100110101100110110100110111
10011101010011101110011110110011111010011111110101010110101011110101101110101110110101111110110110111110111011110111111111
It has an algorithm which generates Lyndon words in CAT. CAT stands for Constant Amortized Time. This means that it isn’t really constant time, but approaches it very much. Calculating the worst-case scenario isn’t worth it. For example using an ArrayList in Java. The time to do insertions is said to be O(1), and this is true…. until the array underneath needs to be resized. During this resize it’ll take some more time. So the algorithm isn’t really constant time, but constant amortized time.
This is the program I ended up with. It is very very small. One of my most beautiful pieces of code. And it even supports non-binary alphabets (use the k-parameter).
a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations.
http://introcs.cs.princeton.edu/java/31datatype/DeBruijn.java.html