https://leetcode.com/problems/similar-string-groups/
https://leetcode.com/problems/similar-string-groups/discuss/132374/Short-C%2B%2B-solution-at-220ms-using-disjoint-set
X. BFS
https://leetcode.com/problems/similar-string-groups/discuss/132431/Easy-to-understand-BFS-solution-with-detailed-explanation
modified to pass corner case "aaaaa", "aaaaa","aaaaa"
X. DFS
https://leetcode.com/problems/similar-string-groups/discuss/132318/Simple-Java-Solution-using-DFS
your code did not pass the test case [aaaa,aaaa,aaaa.....]
Two strings
X
and Y
are similar if we can swap two letters (in different positions) of X
, so that it equals Y
.
For example,
"tars"
and "rats"
are similar (swapping at positions 0
and 2
), and "rats"
and "arts"
are similar, but "star"
is not similar to "tars"
, "rats"
, or "arts"
.
Together, these form two connected groups by similarity:
{"tars", "rats", "arts"}
and {"star"}
. Notice that "tars"
and "arts"
are in the same group even though they are not similar. Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.
We are given a list
A
of strings. Every string in A
is an anagram of every other string in A
. How many groups are there?
Example 1:
Input: ["tars","rats","arts","star"] Output: 2
Note:
A.length <= 2000
A[i].length <= 1000
A.length * A[i].length <= 20000
- All words in
A
consist of lowercase letters only. - All words in
A
have the same length and are anagrams of each other. - The judging time limit has been increased for this question
Let
W = A[0].length
. It is clear that we can determine in time, whether two words from A
are similar.
One attempt is a standard kind of brute force: for each pair of words, let's draw an edge between these words if they are similar. We can do this in time. After, finding the connected components can be done in time naively (each node may have up to edges), (or with a union-find structure.) The total complexity is .
Another attempt is to enumerate all neighbors of a word. A
word
has up to neighbors, and if that neighbor
is itself a given word, we know that word
and neighbor
are connected by an edge. In this way, we can build the graph in time, and again take or time to analyze the number of connected components.
One insight is that between these two approaches, we can choose which approach works better. If we have very few words, we want to use the first approach; if we have very short words, we want to use the second approach. We'll piecewise add these two approaches (with complexity and ), to create an approach with complexity.
Algorithm
We will build some underlying graph with
N
nodes, where nodes i
and j
are connected if and only if A[i]
is similar to A[j]
, then look at the number of connected components.
There are a few challenges involved in this problem, but each challenge is relatively straightforward.
- Use a helper function
similar(word1, word2)
that istrue
if and only if two given words are similar. - Enumerate all neighbors of a word, and discover when it is equal to a given word.
- Use either a union-find structure or a depth-first search, to calculate the number of connected components of the underlying graph. We've showcased a union-find structure in this solution, with notes of a depth-first search in the comments.
- Time Complexity: , where is the length of
A
, and is the length of each given word. - Space Complexity: . If , the space complexity is . Otherwise, the space complexity is : for each of neighbors we store a word of length . (Plus, we store node indices ("buckets") which is dominated by the term.) Because in this case, we could also write the space complexity as .
public int numSimilarGroups(String[] A) {
int N = A.length;
int W = A[0].length();
DSU dsu = new DSU(N);
if (N < W * W) { // If few words, then check for pairwise similarity: O(N^2 W)
for (int i = 0; i < N; ++i)
for (int j = i + 1; j < N; ++j)
if (similar(A[i], A[j]))
dsu.union(i, j);
} else { // If short words, check all neighbors: O(N W^3)
Map<String, List<Integer>> buckets = new HashMap();
for (int i = 0; i < N; ++i) {
char[] L = A[i].toCharArray();
for (int j0 = 0; j0 < L.length; ++j0)
for (int j1 = j0 + 1; j1 < L.length; ++j1) {
swap(L, j0, j1);
StringBuilder sb = new StringBuilder();
for (char c : L)
sb.append(c);
buckets.computeIfAbsent(sb.toString(), x -> new ArrayList<Integer>()).add(i);
swap(L, j0, j1);
}
}
for (int i1 = 0; i1 < A.length; ++i1)
if (buckets.containsKey(A[i1]))
for (int i2 : buckets.get(A[i1]))
dsu.union(i1, i2);
}
int ans = 0;
for (int i = 0; i < N; ++i)
if (dsu.parent[i] == i)
ans++;
return ans;
}
public boolean similar(String word1, String word2) {
int diff = 0;
for (int i = 0; i < word1.length(); ++i)
if (word1.charAt(i) != word2.charAt(i))
diff++;
return diff <= 2;
}
public void swap(char[] A, int i, int j) {
char tmp = A[i];
A[i] = A[j];
A[j] = tmp;
}
}
class DSU {
int[] parent;
public DSU(int N) {
parent = new int[N];
for (int i = 0; i < N; ++i)
parent[i] = i;
}
public int find(int x) {
if (parent[x] != x)
parent[x] = find(parent[x]);
return parent[x];
}
public void union(int x, int y) {
parent[find(x)] = find(y);
}
https://leetcode.com/problems/similar-string-groups/discuss/132317/Simple-Java-8-Python-Union-Find
isSimilar does a simple similarity test as per the constraints of the problem. Once we have this and UnionFind, we just do the O(n ^ 2 ) comparison over all nC2 strings, and if they are similar, then we integrate them into a group using Union.
I have used Java8's style in for loops and in using streams to filter and count, though it appears a bit less reader friendly in the beginning, I find it more elegant :) to code.
public int numSimilarGroups(String[] a) {
int n = a.length;
UnionFind uFind = new UnionFind(n);
range(0, n).forEach(i -> range(i+1, n).filter(j -> isSimilar(a[i], a[j])).forEach(j -> uFind.union(i, j)));
return uFind.getNumGroups();
}
private boolean isSimilar(String a, String b) {
return range(0, a.length()).filter(i -> a.charAt(i) != b.charAt(i)).count() == 2;
}
}
public class UnionFind {
int[] parent;
int[] rank;
public UnionFind(int n) {
this.parent = range(0, n).toArray();
this.rank = new int[n];
}
public void union(int x, int y) {
int root1 = find(x);
int root2 = find(y);
if(root1 == root2)
return;
if(rank[root1] > rank[root2])
parent[root2] = root1;
else
parent[root1] = root2;
if(rank[root1] == rank[root2])
rank[root2] += 1;
}
public int find(int x) {
while(parent[x] != x) x = parent[x];
return x;
}
public int getNumGroups() {
return (int)range(0, parent.length).filter(i -> i == parent[i]).count();
}
X. BFS
https://leetcode.com/problems/similar-string-groups/discuss/132431/Easy-to-understand-BFS-solution-with-detailed-explanation
This question asks for number of disjoint groups. While a group contains strings swappable from at least one other string in it, it looks like this is a graph problem: strings are the nodes in the graph, and two strings (nodes) have a path between them if they are similar.
- How do we know if two strings are similar?This can be determined by calculating hamming distance between them. Since all strings are anagrams, if two strings have hamming distance of two, then they can be converted into each other by swapping the only two different characters.In other wors, two strings are similar if and only if they have hamming distance two.We can compute the hamming distance between every two strings in O(n ^ 2 * l) time. And if two strings have hamming distance 2, they are definitely in the same group.
- How do we represent the graph?Now we know all nodes in the graph (which are all strings in the input), we also know the edges between them. We've got everything we need!Here i choose to use the adjacency list representation of the graph. If two strings have hamming distance 2, there's an edge between them.
- How to find all disjoint groups?Here comes the easiest part! We can find a group by visiting all nodes in it.If two nodes A and B are in different groups, there's no way we can go from A to B. However, if A and C are in the same group, there must exist a way we can go from A to C.Similarly, we can go from the first string, visit all strings that are in the same group as the first string(we call this group group_1). Then find another string which we haven't visted yet and visit all strings in that group (we call this group group_2). We can do this until all strings have been visited and we find all the groups.Actually this is the problem of finding all connected components of a graph . Now we have our graph, we can use BFS to do this!
modified to pass corner case "aaaaa", "aaaaa","aaaaa"
X. DFS
https://leetcode.com/problems/similar-string-groups/discuss/132318/Simple-Java-Solution-using-DFS
your code did not pass the test case [aaaa,aaaa,aaaa.....]