[Algo] Parse XML Tree 解析XML文件 - SegmentFault


[Algo] Parse XML Tree 解析XML文件 - SegmentFault
现在有一个Tokenizer,返回的Token都是XML标签或者内容,比如(open, html)(inner, hello)(close, html)表示<html>hello</html>,每一个括号及其内容是一个Token,请问如何表示这个XML文件。
这题首先要想清楚的是,如何表示XML,因为XML是典型的一父多子,我们用树来表示比较好。然后分析下如何用Tokenizer,Tokenizer有点像Iterator,每当我们用Tokenizer拿到一个Token时,如果这是一个Open的Token,我们需要新建一个节点,这个新节点下面也有可能有新节点。如果是一个Inner的Token,我们也需要新建一个节点,但这个节点下面不会有新的节点。如果是一个Close的Token,我们不需要新节点,而且需要保证上一个Open节点不再接纳新节点了,而对于新节点则要附在上一层的节点后面。这里,我们用栈可以保留上一层的节点信息,帮助我们建树。如果这是一个Open的Token,我们需要新建一个节点加入上一层节点后面,并加入栈中。如果是一个Inner的Token,我们也需要新建一个节点加到上一层节点后面,但不加入栈中。如果是一个Close的Token,则把上一层节点弹出栈。
public class XMLParser {
    
    public static void main(String[] args){
        XMLParser xml = new XMLParser();
        XMLNode root = xml.parse("(open,html)(open,head)(inner,welcome)(close,head)(open,body)(close,body)(close,html)");
        xml.printXMLTree(root, 0);
    }
    
    public XMLNode parse(String str){
        // 以右括号为delimiter
        StringTokenizer tknz = new StringTokenizer(str, ")");
        Stack<XMLNode> stk = new Stack<XMLNode>();
        // 将第一个open节点作为根节点压入栈中
        XMLNode root = convertTokenToTreeNode(tknz.nextToken());
        stk.push(root);
        while(!stk.isEmpty()){
            if(!tknz.hasMoreTokens()){
                break;
            }
            XMLNode curr = convertTokenToTreeNode(tknz.nextToken());
            // 得到上一层节点
            XMLNode father = stk.peek();
            // 根据当前节点的类型做不同处理
            switch(curr.type){
                // 对于Open节点,我们把它加入上一层节点的后面,并加入栈中
                case "open":
                    father.children.add(curr);
                    stk.push(curr);
                    break;
                // Close节点直接把上一层Pop出来就行了,这样就不会有新的节点加到上一层节点后面    
                case "close":
                    stk.pop();
                    break;
                // Inner节点只加到上一层节点后面    
                case "inner":
                    father.children.add(curr);
                    break;
            }
        }
        return root;
    }
    
    private XMLNode convertTokenToTreeNode(String token){
        token = token.substring(1);
        String[] parts = token.split(",");
        return new XMLNode(parts[0], parts[1]);
    }
    
    private void printXMLTree(XMLNode root, int depth){
        for(int i = 0; i < depth; i++){
            System.out.print("-");
        }
        System.out.println(root.type + ":" + root.value);
        for(XMLNode node : root.children){
            printXMLTree(node, depth + 1);
        }
    }
}

class XMLNode {
    String type;
    String value;
    List<XMLNode> children;
    
    XMLNode(String type, String value){
        this.type = type;
        this.value = value;
        this.children = new ArrayList<XMLNode>();
    }
}
Read full article from [Algo] Parse XML Tree 解析XML文件 - SegmentFault

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts