LeetCode 839 - Similar String Groups


https://leetcode.com/problems/similar-string-groups/
Two strings X and Y are similar if we can swap two letters (in different positions) of X, so that it equals Y.
For example, "tars" and "rats" are similar (swapping at positions 0 and 2), and "rats" and "arts" are similar, but "star" is not similar to "tars""rats", or "arts".
Together, these form two connected groups by similarity: {"tars", "rats", "arts"} and {"star"}.  Notice that "tars"and "arts" are in the same group even though they are not similar.  Formally, each group is such that a word is in the group if and only if it is similar to at least one other word in the group.
We are given a list A of strings.  Every string in A is an anagram of every other string in A.  How many groups are there?
Example 1:
Input: ["tars","rats","arts","star"]
Output: 2
Note:
  1. A.length <= 2000
  2. A[i].length <= 1000
  3. A.length * A[i].length <= 20000
  4. All words in A consist of lowercase letters only.
  5. All words in A have the same length and are anagrams of each other.
  6. The judging time limit has been increased for this question
https://leetcode.com/articles/similar-string-groups/
Let W = A[0].length. It is clear that we can determine in O(W) time, whether two words from A are similar.
One attempt is a standard kind of brute force: for each pair of words, let's draw an edge between these words if they are similar. We can do this in O(N^2 W) time. After, finding the connected components can be done in O(N^2) time naively (each node may have up to N-1 edges), (or O(N) with a union-find structure.) The total complexity is O(N^2 W).
Another attempt is to enumerate all neighbors of a word. A word has up to \binom{W}{2} neighbors, and if that neighbor is itself a given word, we know that word and neighbor are connected by an edge. In this way, we can build the graph in O(N W^3) time, and again take O(N^2) or O(N) time to analyze the number of connected components.
One insight is that between these two approaches, we can choose which approach works better. If we have very few words, we want to use the first approach; if we have very short words, we want to use the second approach. We'll piecewise add these two approaches (with complexity O(N^2 W) and O(N W^3)), to create an approach with O(NW\min(N, W^2)) complexity.
Algorithm
We will build some underlying graph with N nodes, where nodes i and j are connected if and only if A[i] is similar to A[j], then look at the number of connected components.
There are a few challenges involved in this problem, but each challenge is relatively straightforward.
  • Use a helper function similar(word1, word2) that is true if and only if two given words are similar.
  • Enumerate all neighbors of a word, and discover when it is equal to a given word.
  • Use either a union-find structure or a depth-first search, to calculate the number of connected components of the underlying graph. We've showcased a union-find structure in this solution, with notes of a depth-first search in the comments.

  • Time Complexity: O(NW \min(N, W^2)), where N is the length of A, and W is the length of each given word.
  • Space Complexity: O(NW^3). If N &lt; W^2, the space complexity is O(N). Otherwise, the space complexity is O(NW^3): for each of NW^2 neighbors we store a word of length W. (Plus, we store O(NW^2) node indices ("buckets") which is dominated by the O(NW^3) term.) Because W^2 &lt;= N in this case, we could also write the space complexity as O(N^2 W).
  public int numSimilarGroups(String[] A) {
    int N = A.length;
    int W = A[0].length();
    DSU dsu = new DSU(N);

    if (N < W * W) { // If few words, then check for pairwise similarity: O(N^2 W)
      for (int i = 0; i < N; ++i)
        for (int j = i + 1; j < N; ++j)
          if (similar(A[i], A[j]))
            dsu.union(ij);

    else { // If short words, check all neighbors: O(N W^3)
      Map<String, List<Integer>> buckets = new HashMap();
      for (int i = 0; i < N; ++i) {
        char[] L = A[i].toCharArray();
        for (int j0 = 0; j0 < L.length; ++j0)
          for (int j1 = j0 + 1; j1 < L.length; ++j1) {
            swap(Lj0j1);
            StringBuilder sb = new StringBuilder();
            for (char c : L)
              sb.append(c);
            buckets.computeIfAbsent(sb.toString(), x -> new ArrayList<Integer>()).add(i);
            swap(Lj0j1);
          }
      }

      for (int i1 = 0; i1 < A.length; ++i1)
        if (buckets.containsKey(A[i1]))
          for (int i2 : buckets.get(A[i1]))
            dsu.union(i1i2);
    }

    int ans = 0;
    for (int i = 0; i < N; ++i)
      if (dsu.parent[i] == i)
        ans++;

    return ans;
  }

  public boolean similar(String word1, String word2) {
    int diff = 0;
    for (int i = 0; i < word1.length(); ++i)
      if (word1.charAt(i) != word2.charAt(i))
        diff++;
    return diff <= 2;
  }

  public void swap(char[] Aint iint j) {
    char tmp = A[i];
    A[i] = A[j];
    A[j] = tmp;
  }
}

class DSU {
  int[] parent;

  public DSU(int N) {
    parent = new int[N];
    for (int i = 0; i < N; ++i)
      parent[i] = i;
  }

  public int find(int x) {
    if (parent[x] != x)
      parent[x] = find(parent[x]);
    return parent[x];
  }

  public void union(int xint y) {
    parent[find(x)] = find(y);

  }
https://leetcode.com/problems/similar-string-groups/discuss/132317/Simple-Java-8-Python-Union-Find
isSimilar does a simple similarity test as per the constraints of the problem. Once we have this and UnionFind, we just do the O(n ^ 2 ) comparison over all nC2 strings, and if they are similar, then we integrate them into a group using Union.


I have used Java8's style in for loops and in using streams to filter and count, though it appears a bit less reader friendly in the beginning, I find it more elegant :) to code.
    public int numSimilarGroups(String[] a) {
        int n = a.length;
        UnionFind uFind = new UnionFind(n);
        range(0, n).forEach(i -> range(i+1, n).filter(j -> isSimilar(a[i], a[j])).forEach(j -> uFind.union(i, j)));
        return uFind.getNumGroups();
    }

    private boolean isSimilar(String a, String b) {
        return range(0, a.length()).filter(i -> a.charAt(i) != b.charAt(i)).count() == 2;
    }
}


public class UnionFind {
    int[] parent;
    int[] rank;

    public UnionFind(int n) {
        this.parent = range(0, n).toArray();
        this.rank = new int[n];
    }

    public void union(int x, int y) {
        int root1 = find(x);
        int root2 = find(y);
        if(root1 == root2)
            return;
        if(rank[root1] > rank[root2])
            parent[root2] = root1;
        else
            parent[root1] = root2;
        if(rank[root1] == rank[root2])
            rank[root2] += 1;
    }

    public int find(int x) {
        while(parent[x] != x) x = parent[x];
        return x;
    }

    public int getNumGroups() {
        return (int)range(0, parent.length).filter(i -> i == parent[i]).count();
    }

https://leetcode.com/problems/similar-string-groups/discuss/132374/Short-C%2B%2B-solution-at-220ms-using-disjoint-set
X. BFS
https://leetcode.com/problems/similar-string-groups/discuss/132431/Easy-to-understand-BFS-solution-with-detailed-explanation
This question asks for number of disjoint groups. While a group contains strings swappable from at least one other string in it, it looks like this is a graph problemstrings are the nodes in the graph, and two strings (nodes) have a path between them if they are similar.


  1. How do we know if two strings are similar?
    This can be determined by calculating hamming distance between them. Since all strings are anagrams, if two strings have hamming distance of two, then they can be converted into each other by swapping the only two different characters.
    In other wors, two strings are similar if and only if they have hamming distance two.
    We can compute the hamming distance between every two strings in O(n ^ 2 * l) time. And if two strings have hamming distance 2, they are definitely in the same group.
  2. How do we represent the graph?
    Now we know all nodes in the graph (which are all strings in the input), we also know the edges between them. We've got everything we need!
    Here i choose to use the adjacency list representation of the graph. If two strings have hamming distance 2, there's an edge between them.
  3. How to find all disjoint groups?
    Here comes the easiest part! We can find a group by visiting all nodes in it.
    If two nodes A and B are in different groups, there's no way we can go from A to B. However, if A and C are in the same group, there must exist a way we can go from A to C.
    Similarly, we can go from the first string, visit all strings that are in the same group as the first string(we call this group group_1). Then find another string which we haven't visted yet and visit all strings in that group (we call this group group_2). We can do this until all strings have been visited and we find all the groups.
    Actually this is the problem of finding all connected components of a graph . Now we have our graph, we can use BFS to do this!
https://leetcode.com/problems/similar-string-groups/discuss/132464/Simple-JAVA-with-BFS
modified to pass corner case "aaaaa", "aaaaa","aaaaa"
just add
 (diff == 0 && curStr.length() >=2)
public int numSimilarGroups(String[] A) {
        int group = 0;

        boolean[] visited = new boolean[A.length];

        for(int k = 0; k < A.length; k++) {
            if (visited[k]) {
                continue;
            }
            group++;
            Queue<String> q = new LinkedList<>();
            q.offer(A[k]);
            while(!q.isEmpty()) {
                String curStr = q.poll();
                for(int m = 0; m < A.length; m++){
                    if (visited[m]) {
                        continue;
                    }
                    String nextStr = A[m];
                    int diff = 0;
                    for(int i = 0; i < curStr.length(); i++) {
                        if (curStr.charAt(i) != nextStr.charAt(i)) {
                            diff++;
                        }
                    }
                    //important here, "aaaaa", "aaaaa" should be grouped together
                    if (diff == 2 || (diff == 0 && curStr.length() >=2) ) {
                        visited[m] = true;
                        q.offer(nextStr);
                    }
                }
            }
        }
        return group;
    }
X. DFS
https://leetcode.com/problems/similar-string-groups/discuss/132318/Simple-Java-Solution-using-DFS
your code did not pass the test case [aaaa,aaaa,aaaa.....]


Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts