http://www.codeproject.com/Articles/9680/Persistent-Data-Structures
another meaning for the word persistence when it is used to describe data structures, particularly those used in functional programming languages. In that context, a persistent data structure is a data structure capable of preserving the current version of itself when modified. In essence, a persistent data structure is immutable.
Persistent Singly Linked Lists
Persistent Binary Trees
another meaning for the word persistence when it is used to describe data structures, particularly those used in functional programming languages. In that context, a persistent data structure is a data structure capable of preserving the current version of itself when modified. In essence, a persistent data structure is immutable.
Persistent Singly Linked Lists
To insert a new node at the fourth position, we traverse the list as before only copying each node along the way. Each copied node is linked to the next copied node:
The last copied node is linked to the new node, and the new node is linked to the remaining nodes in the old list:
Persistent Binary Trees
This illustrates the "spine" of the search down the tree. The red nodes are the only nodes that need to be copied in making a new version of the tree:
You can see that the majority of the nodes do not need to be copied. Assuming the binary tree is balanced, the number of nodes that need to be copied any time a write operation is performed is at most O(Log N), where Log is base 2.
Insertions and deletions work the same way, only steps should be taken to keep the tree in balance, such as using an AVL tree. If a binary tree becomes degenerate, we run into the same efficiency problems as we did with the singly linked list.
Random Access Lists
An interesting persistent data structure that combines the singly linked list with the binary tree is Chris Okasaki'srandom-access list. This data structure allows for random access of its items as well as adding and removing items from the beginning of the list. It is structured as a singly linked list of completely balanced binary trees. The advantage of this data structure is that it allows access, insertion, and removal of the head of the list in O(1) time as well as provides logarithmic performance in randomly accessing its items.
Immutable Collections
Stack
This one was easy. Simply create a persistent singly linked list and limit insertions and deletions to the head of the list. Since this class is persistent, popping a stack returns a new version of the stack with the next item in the old stack as the new top. In the
System.Collections.Stack
version, popping the stack returns the top of the stack. The question for the persistent version was how to make the top of the stack available since it cannot be returned when the stack is popped. I chose to create a Top
property that represents the top of the stack.
SortedList
The
SortedList
uses AVL tree algorithms to keep the tree in balance. I found it useful to create an IAvlNode
interface. Two classes implement this interface, the AvlNode
class and the NullAvlNode
class. TheNullAvlNode
class implements the null object design pattern. This simplified many of the algorithms.
ArrayList
This is the class that proved most challenging. Like the
SortedList
, it uses a persistent AVL tree as its data structure. However, unlike the SortedList
, items are accessed by index (or by position) rather than by key. I have to admit that the algorithms for accessing and inserting items in a binary tree by index weren't intuitive to me, so I turned to Knuth. Specifically, I used Algorithms B and C in section 6.2.3 in volume 3 of The Art of Computer Programming.
I made an assumption about the
ArrayList
in order to improve performance. I assumed that the Add
method is by far the most used method. However, adding items to the ArrayList
one right after the other causes a lot of tree rotations to keep the tree in balance. To solve this, I created a template tree that is already completely balanced. Since this template tree is immutable, it can exist at the class level and be shared amongst all of the instances of the class.
When an instance of the
ArrayList
class is created, it takes a small subtree of the template tree. As items are added, the nodes in the template tree are replaced with new nodes. Since the tree is completely balanced, no rebalancing is necessary. If the subtree gets filled up, another subtree of equal height is taken from the template tree and joined to the existing tree. Insertions and deletions are handled normally with rebalancing performed if necessary. Again, the assumption is that adding items to the ArrayList
occurs much more frequently than inserting or deleting items.
Array
The
Array
class uses the random access list structure to provide a persistent array with logarithmic performance. Unlike a random access list, it has a fixed size.
http://programmers.stackexchange.com/questions/221762/why-doesnt-java-8-include-immutable-collections
IMMUTABLE LISTS
Instead of adding this new node to the end, we can add it to the beginning.
PERSISTENT TRIES
The high branching factor reduces the time for operations on tries
In addition to the high branching factor of thirty-two or more, tries use a special property to organize the keys. The key for a node is determined by its path. In other words, we don’t store the key in the nodes; their location is their key.
An operation to change any element requires copying only up to four elements—close to constant time operation
Further more, immutable collections often have fundamentally different implementations than their mutable counterparts. Consider for example
ArrayList
, an efficient immutable ArrayList
wouldn't be an array at all! It should be implemented with a balanced tree with a large branching factor, Clojure uses 32 IIRC. Making mutable collections be "immutable" by just adding a functional update is a performance bug just as much as a memory leak is.IMMUTABLE LISTS
Instead of adding this new node to the end, we can add it to the beginning.
PERSISTENT TRIES
The high branching factor reduces the time for operations on tries
In addition to the high branching factor of thirty-two or more, tries use a special property to organize the keys. The key for a node is determined by its path. In other words, we don’t store the key in the nodes; their location is their key.
An operation to change any element requires copying only up to four elements—close to constant time operation