http://www.geeksforgeeks.org/print-valid-words-possible-using-characters-array/
Fast and Easy Levenshtein distance using a Trie
http://stevehanov.ca/blog/index.php?id=114
With a trie, all shared prefixes in the dictionary are collaped into a single path, so we can process them in the best order for building up our levenshtein tables one row at a time.
Given a dictionary and a character array, print all valid words that are possible using characters from the array.
The idea is to use Trie data structure to store dictionary, then search words in Trie using characters of given array.
- Create an empty Trie and insert all words of given dictionary into the Trie.
- After that, we have pick only those characters in ‘Arr[]’ which are a child of the root of Trie.
- To quickly find whether a character is present in array or not, we create a hash of character arrays.
struct TrieNode{ TrieNode *Child[SIZE]; // isLeaf is true if the node represents // end of a word bool leaf;};// A recursive function to print all possible valid// words present in arrayvoid searchWord(TrieNode *root, bool Hash[], string str){ // if we found word in trie / dictionary if (root->leaf == true) cout << str << endl ; // traverse all child's of current root for (int K =0; K < SIZE; K++) { if (Hash[K] == true && root->Child[K] != NULL ) { // add current character char c = int_to_char(K); // Recursively search reaming character of word // in trie searchWord(root->Child[K], Hash, str + c); } }}// Prints all words present in dictionary.void PrintAllWords(char Arr[], TrieNode *root, int n){ // create a 'has' array that will store all present // character in Arr[] bool Hash[SIZE]; for (int i = 0 ; i < n; i++) Hash[char_int(Arr[i])] = true; // tempary node TrieNode *pChild = root ; // string to hold output words string str = ""; // Traverse all matrix elements. There are only 26 // character possible in char array for (int i = 0 ; i < SIZE ; i++) { // we start searching for word in dictionary // if we found a character which is child // of Trie root if (Hash[i] == true && pChild->Child[i] ) { str = str+(char)int_to_char(i); searchWord(pChild->Child[i], Hash, str); str = ""; } }} // Root Node of Trie TrieNode *root = getNode(); // insert all words of dictionary into trie int n = sizeof(Dict)/sizeof(Dict[0]); for (int i=0; i<n; i++) insert(root, Dict[i]); char arr[] = {'e', 'o', 'b', 'a', 'm', 'g', 'l'} ; int N = sizeof(arr)/sizeof(arr[0]); PrintAllWords(arr, root, N);Fast and Easy Levenshtein distance using a Trie
http://stevehanov.ca/blog/index.php?id=114
Only the last row changes. We can avoid a lot of work if we can process the words in order, so we never need to repeat a row for the same prefix of letters. The trie data structure is perfect for this. A trie is a giant tree, where each node represents a partial or complete word. Here's one with the words cat, cats, catacomb, and catacombs in it (courtesy of zwibbler.com). Nodes that represent a word are marked in black.
class TrieNode:
def __init__(self):
self.word = None
self.children = {}
global NodeCount
NodeCount += 1
def insert( self, word ):
node = self
for letter in word:
if letter not in node.children:
node.children[letter] = TrieNode()
node = node.children[letter]
node.word = word
# read dictionary file into a trie
trie = TrieNode()
for word in open(DICTIONARY, "rt").read().split():
WordCount += 1
trie.insert( word )
print "Read %d words into %d nodes" % (WordCount, NodeCount)
# The search function returns a list of all words that are less than the given
# maximum distance from the target word
def search( word, maxCost ):
# build first row
currentRow = range( len(word) + 1 )
results = []
# recursively search each branch of the trie
for letter in trie.children:
searchRecursive( trie.children[letter], letter, word, currentRow,
results, maxCost )
return results
# This recursive helper is used by the search function above. It assumes that
# the previousRow has been filled in already.
def searchRecursive( node, letter, word, previousRow, results, maxCost ):
columns = len( word ) + 1
currentRow = [ previousRow[0] + 1 ]
# Build one row for the letter, with a column for each letter in the target
# word, plus one for the empty string at column 0
for column in xrange( 1, columns ):
insertCost = currentRow[column - 1] + 1
deleteCost = previousRow[column] + 1
if word[column - 1] != letter:
replaceCost = previousRow[ column - 1 ] + 1
else:
replaceCost = previousRow[ column - 1 ]
currentRow.append( min( insertCost, deleteCost, replaceCost ) )
# if the last entry in the row indicates the optimal cost is less than the
# maximum cost, and there is a word in this trie node, then add it.
if currentRow[-1] <= maxCost and node.word != None:
results.append( (node.word, currentRow[-1] ) )
# if any entries in the row are less than the maximum cost, then
# recursively search each branch of the trie
if min( currentRow ) <= maxCost:
for letter in node.children:
searchRecursive( node.children[letter], letter, word, currentRow,
results, maxCost )
start = time.time()
results = search( TARGET, MAX_COST )
end = time.time()
for result in results: print result
print "Search took %g s" % (end - start)