Shortest unique prefix to represent word in an array | shawnlincoding
Use the shorest unique prefix to represent each word in the array
input: ["zebra", "dog", "duck",”dot”]
output: {zebra: z, dog: do, duck: du}
[zebra, dog, duck, dove]
{zebra:z, dog: dog, duck: du, dove: dov}
[bearcat, bear]
{bearcat: bearc, bear: ""}
http://www.geeksforgeeks.org/find-all-shortest-unique-prefixes-to-represent-each-word-in-a-given-list/
看到prefix肯定想到的是trie咯,在trie node里面多加一个field count,表示
这个字符出现多少次
1,insert word into the trie
2, search the word, 找到第一个count为1的node,返回
因为说明这个node下面没有分支了,他就应该是唯一的
Time Complexity: O(NL), l is the max length of the word.
If we can't use extrac space:
排序,然后计算相邻字符串的最短前缀,这个时间复杂度比较高,应该是O(nLog(n))
Read full article from Shortest unique prefix to represent word in an array | shawnlincoding
Use the shorest unique prefix to represent each word in the array
input: ["zebra", "dog", "duck",”dot”]
output: {zebra: z, dog: do, duck: du}
[zebra, dog, duck, dove]
{zebra:z, dog: dog, duck: du, dove: dov}
[bearcat, bear]
{bearcat: bearc, bear: ""}
http://www.geeksforgeeks.org/find-all-shortest-unique-prefixes-to-represent-each-word-in-a-given-list/
A Simple Solution is to consider every prefix of every word (starting from the shortest to largest), and if a prefix is not prefix of any other string, then print it.
An Efficient Solution is to use Trie. The idea is to maintain a count in every node. Below are steps.
1) Construct a Trie of all words. Also maintain frequency of every node (Here frequency is number of times node is visited during insertion). Time complexity of this step is O(N) where N is total number of characters in all words.
2) Now, for every word, we find the character nearest to the root with frequency as 1. The prefix of the word is path from root to this character. To do this, we can traverse Trie starting from root. For every node being traversed, we check its frequency. If frequency is one, we print all characters from root to this node and don’t traverse down this node.
Time complexity if this step also is O(N) where N is total number of characters in all words.
root / \ (d, 3)/ \(z, 1) / \ Node1 Node2 / \ \ (o,2)/ \(u,1) \(e,1) / \ \ Node1.1 Node1.2 Node2.1 / \ \ \ (g,1)/ \ (t,1) \(c,1) \(b,1) / \ \ \ Leaf Leaf Node1.2.1 Node2.1.1 (dog) (dot) \ \ \(k, 1) \(r, 1) \ \ Leaf Node2.1.1.1 (duck) \ \(a,1) \ Leaf (zebra)
struct
trieNode
{
struct
trieNode *child[MAX];
int
freq;
// To store frequency
};
// Function to create a new trie node.
struct
trieNode *newTrieNode(
void
)
{
struct
trieNode *newNode =
new
trieNode;
newNode->freq = 1;
for
(
int
i = 0; i<MAX; i++)
newNode->child[i] = NULL;
return
newNode;
}
// Method to insert a new string into Trie
void
insert(
struct
trieNode *root, string str)
{
// Length of the URL
int
len = str.length();
struct
trieNode *pCrawl = root;
// Traversing over the length of given str.
for
(
int
level = 0; level<len; level++)
{
// Get index of child node from current character
// in str.
int
index = str[level];
// Create a new child if not exist already
if
(!pCrawl->child[index])
pCrawl->child[index] = newTrieNode();
else
(pCrawl->child[index]->freq)++;
// Move to the child
pCrawl = pCrawl->child[index];
}
}
// This function prints unique prefix for every word stored
// in Trie. Prefixes one by one are stored in prefix[].
// 'ind' is current index of prefix[]
void
findPrefixesUtil(
struct
trieNode *root,
char
prefix[],
int
ind)
{
// Corner case
if
(root == NULL)
return
;
// Base case
if
(root->freq == 1)
{
prefix[ind] =
'\0'
;
cout << prefix <<
" "
;
return
;
}
for
(
int
i=0; i<MAX; i++)
{
if
(root->child[i] != NULL)
{
prefix[ind] = i;
findPrefixesUtil(root->child[i], prefix, ind+1);
}
}
}
// Function to print all prefixes that uniquely
// represent all words in arr[0..n-1]
void
findPrefixes(string arr[],
int
n)
{
// Construct a Trie of all words
struct
trieNode *root = newTrieNode();
root->freq = 0;
for
(
int
i = 0; i<n; i++)
insert(root, arr[i]);
// Create an array to store all prefixes
char
prefix[MAX_WORD_LEN];
// Print all prefixes using Trie Traversal
findPrefixesUtil(root, prefix, 0);
}
看到prefix肯定想到的是trie咯,在trie node里面多加一个field count,表示
这个字符出现多少次
1,insert word into the trie
2, search the word, 找到第一个count为1的node,返回
因为说明这个node下面没有分支了,他就应该是唯一的
Time Complexity: O(NL), l is the max length of the word.
public
static
final
int
R =
256
;
private
Node root;
private
class
Node{
private
int
count;
private
boolean
isEnd;
private
Node next[] =
new
Node[R];
public
Node(){
count =
0
;
isEnd =
false
;
}
public
Node(
int
count,
boolean
isEnd){
this
.count = count;
this
.isEnd = isEnd;
}
}
public
void
insert(String str){
if
(root ==
null
) root =
new
Node();
Node curr = root;
for
(
int
i =
0
; i < str.length(); i++){
char
c = str.charAt(i);
if
(curr.next[c] ==
null
){
curr.next[c] =
new
Node(
1
,
false
);
}
else
{
curr.next[c].count++;
}
curr = curr.next[c];
}
curr.isEnd =
true
;
}
If we can't use extrac space:
排序,然后计算相邻字符串的最短前缀,这个时间复杂度比较高,应该是O(nLog(n))
Read full article from Shortest unique prefix to represent word in an array | shawnlincoding