http://www.geeksforgeeks.org/print-valid-words-possible-using-characters-array/
Fast and Easy Levenshtein distance using a Trie
http://stevehanov.ca/blog/index.php?id=114
With a trie, all shared prefixes in the dictionary are collaped into a single path, so we can process them in the best order for building up our levenshtein tables one row at a time.
Given a dictionary and a character array, print all valid words that are possible using characters from the array.
The idea is to use Trie data structure to store dictionary, then search words in Trie using characters of given array.
- Create an empty Trie and insert all words of given dictionary into the Trie.
- After that, we have pick only those characters in ‘Arr[]’ which are a child of the root of Trie.
- To quickly find whether a character is present in array or not, we create a hash of character arrays.
struct
TrieNode
{
TrieNode *Child[SIZE];
// isLeaf is true if the node represents
// end of a word
bool
leaf;
};
// A recursive function to print all possible valid
// words present in array
void
searchWord(TrieNode *root,
bool
Hash[], string str)
{
// if we found word in trie / dictionary
if
(root->leaf ==
true
)
cout << str << endl ;
// traverse all child's of current root
for
(
int
K =0; K < SIZE; K++)
{
if
(Hash[K] ==
true
&& root->Child[K] != NULL )
{
// add current character
char
c = int_to_char(K);
// Recursively search reaming character of word
// in trie
searchWord(root->Child[K], Hash, str + c);
}
}
}
// Prints all words present in dictionary.
void
PrintAllWords(
char
Arr[], TrieNode *root,
int
n)
{
// create a 'has' array that will store all present
// character in Arr[]
bool
Hash[SIZE];
for
(
int
i = 0 ; i < n; i++)
Hash[char_int(Arr[i])] =
true
;
// tempary node
TrieNode *pChild = root ;
// string to hold output words
string str =
""
;
// Traverse all matrix elements. There are only 26
// character possible in char array
for
(
int
i = 0 ; i < SIZE ; i++)
{
// we start searching for word in dictionary
// if we found a character which is child
// of Trie root
if
(Hash[i] ==
true
&& pChild->Child[i] )
{
str = str+(
char
)int_to_char(i);
searchWord(pChild->Child[i], Hash, str);
str =
""
;
}
}
}
// Root Node of Trie
TrieNode *root = getNode();
// insert all words of dictionary into trie
int
n =
sizeof
(Dict)/
sizeof
(Dict[0]);
for
(
int
i=0; i<n; i++)
insert(root, Dict[i]);
char
arr[] = {
'e'
,
'o'
,
'b'
,
'a'
,
'm'
,
'g'
,
'l'
} ;
int
N =
sizeof
(arr)/
sizeof
(arr[0]);
PrintAllWords(arr, root, N);
Fast and Easy Levenshtein distance using a Trie
http://stevehanov.ca/blog/index.php?id=114
Only the last row changes. We can avoid a lot of work if we can process the words in order, so we never need to repeat a row for the same prefix of letters. The trie data structure is perfect for this. A trie is a giant tree, where each node represents a partial or complete word. Here's one with the words cat, cats, catacomb, and catacombs in it (courtesy of zwibbler.com). Nodes that represent a word are marked in black.
class TrieNode: def __init__(self): self.word = None self.children = {} global NodeCount NodeCount += 1 def insert( self, word ): node = self for letter in word: if letter not in node.children: node.children[letter] = TrieNode() node = node.children[letter] node.word = word # read dictionary file into a trie trie = TrieNode() for word in open(DICTIONARY, "rt").read().split(): WordCount += 1 trie.insert( word ) print "Read %d words into %d nodes" % (WordCount, NodeCount) # The search function returns a list of all words that are less than the given # maximum distance from the target word def search( word, maxCost ): # build first row currentRow = range( len(word) + 1 ) results = [] # recursively search each branch of the trie for letter in trie.children: searchRecursive( trie.children[letter], letter, word, currentRow, results, maxCost ) return results # This recursive helper is used by the search function above. It assumes that # the previousRow has been filled in already. def searchRecursive( node, letter, word, previousRow, results, maxCost ): columns = len( word ) + 1 currentRow = [ previousRow[0] + 1 ] # Build one row for the letter, with a column for each letter in the target # word, plus one for the empty string at column 0 for column in xrange( 1, columns ): insertCost = currentRow[column - 1] + 1 deleteCost = previousRow[column] + 1 if word[column - 1] != letter: replaceCost = previousRow[ column - 1 ] + 1 else: replaceCost = previousRow[ column - 1 ] currentRow.append( min( insertCost, deleteCost, replaceCost ) ) # if the last entry in the row indicates the optimal cost is less than the # maximum cost, and there is a word in this trie node, then add it. if currentRow[-1] <= maxCost and node.word != None: results.append( (node.word, currentRow[-1] ) ) # if any entries in the row are less than the maximum cost, then # recursively search each branch of the trie if min( currentRow ) <= maxCost: for letter in node.children: searchRecursive( node.children[letter], letter, word, currentRow, results, maxCost ) start = time.time() results = search( TARGET, MAX_COST ) end = time.time() for result in results: print result print "Search took %g s" % (end - start)