Programming Thoughts:
Reasoning thought; Recursive though; Greedy thought; Enumeration thought; Divid and Conquer; Backtracking; Dynamic Programming; Probability thought
Sorting:
Internal sort:All data have read into memory and only sort in memory.
Exchange sort: Bubble sort, Quick sort
Select sort: Select sort, Heap sort
Insert sort: Insert sort, Shell sort
Merge sort: Merge sort
External sort:Data is too big to read into memory once. Thus, process data part by part, finally, merge all parts of sorted data.
Linear List:
Order List和Linked List。
Queue:
A kind of Linear List.
Stack:
Search:
(1) Order Search
O(n)
(2) Binary Search
Unsorted array and using QuickSort O(NlogN+logN);Sorted array O(logN)
(3) Hash Search
O(1)。Data as much as possible to disperse;Hash function as simple as possible。
Five Hash approaches:
1、直接寻址。Y=X+A;
2、除法取余。Y=X%A;
3、数字分析。比如有一组value1=112233,value2=112633,value3=119033,针对这样的数我们分析数中间两个数比较波动,其他数不变。那么我们取key的值就可以是key1=22,key2=26,key3=90。
4、平方取中。
5、折叠:
比如value=135790,要求key是2位数的散列值。那么我们将value变为13+57+90=160,然后去掉高位“1”,此时key=60,哈哈,这就是他们的哈希关系,这样做的目的就是key与每一位value都相关,来做到“散列地址”尽可能分散的目地。
散列冲突解决:
1、开放寻址:将数组中未使用的地址开放给发生冲突的数据作为存储。
(4) Index Search
O(n/3+length)
(5) Binary Search Tree
插入、删除、查找:O(LogN)
(6) Bitmap
大数据的快速查找、判重和删除。也可通过去重压缩数据。爬虫系统的url去重。数据重复率大时复杂度增加。
1Byte = 8Bits。1Byte表示8个数。一个Int空间4Bytes,
a[0]–0-31; a[1]–32-63; a[2]–64-95。
数字存在时,对应的位置value为1,否则为0。int a[1+MAX/32]。MAX为最大的数。
Tree Structures:
(1) Binary Search Tree:
(2) Balanced Search Tree:
Root的左子树高度与右子树高度之差不超过1.
失衡的解决办法:旋转、添加、删除。
检索速度快。
(3) Treap Tree:
有两套数据:结点的Key和优先级value。
特点:
结点的key满足Binary search tree;
结点的优先级满足MiniHeap。
(4) Splay Tree:
AVL变体。根据旋转、伸展将树的结构调整。
(5) Trie Tree:
1.根节点不包含字符,除根结点外每个节点都只包含一个字符。
2.从根节点刀某个节点,将途径节点的字符连接起来就是该节点对应的字符串。
Graph:
1. Basics of Graph:
1.1 Representation of Graphs
Edge list
adjacency lists: Sparse Graphs.|E| is much less than |V|*|V| or |E| is close to |V|.
adjacency matrix: Dense Graphs.|E| is close to |V|*|V|.
To know whether two edges are connected quickly, Adjacency matric is a good way to represent.
1.2 BFS
Prim and Dijkstra algorithms use ideas similar to BFS.
BFS(G,s)
for each vertex u belongs to G.V - {s}
u.color = WHITE
u.d = inifinity
u.t = NIL
s.color = GRAY
s.d = 0
s.t = NIL
Q = Empty
ENQUEUE(Q,s)
while Q != Empty
u = DEQUEUE(Q)
for each v belongs to G.Adj[u]
if v.color == WHITE
v.color = GRAY
v.d = u.d + 1
v.t = u
ENQUEUE(Q,v)
u.color = black
Time complexity: O(V+E)
1.4 DFS
DFS(G)
for each vertex u belongs to G.V
u.color = WHITE
u.t = NIL
time = 0
for each vertex u belongs to G.V
if u.color == WHITE
DFS-VISIT(G,u)
DFS-VISIT(G,u)
time = time + 1 // white vertex u has just been discovered
u.d = time
u.color = GRAY
for each v belongs to G.Adj[u] //explore edge(u)
if v.color == WHITE
v.t = u
DFS-VISIT(G,v)
u.color = BLACK
time = time + 1
u.f = time // blacken u; it is finished
Time complexity: O(V+E)
1.4 Topological Sort
use DFS to perform a topological sort of a directed acyclic graph.
TOPOLOGICAL-SORT(G)
1. call DFS(G) to compute finishing times v.f for each vertex v
2. as each vertex is finished, insert it onto the front of a linked list
3. return the linked list of vertices
O(V+E)
1.5 Strongly-connected components:**
vertice u –> v and v –> u; that is, vertices u and v are reachable from each other.
STRONGLY-CONNECTED-COMPONENTS(G)
1.call DFS(G) to compute finishing times u.f for each vertex u
2.compute GT
3.call DFS(GT), but in the main loop of DFS, consider the vertices
in order of decreasing u.f (as computed in line 1)
4.output the vertices of each tree in the depth-first forest formed in line 3 as a
separate strongly connected component
2. Minimum Spanning Tree:
Kruskal Algorithms :
根据weights为每条边排序。Find判断是否为环路。否,则用union连接两个点。
MST-KRUSKAL(G, w)
A = Empty;
for each vertex v belongs to G.V
MAKE-SET(v)
sort the edges of G:E into nondecreasing order by weight w
for each edge(u,v) belongs to G.E, taken in nondecreasing order by weight
if FIND-SET(u) != FIND-SET(v)
A = A U {(u,v)}
UNION(u,v)
Prim Algorithms:
与Dijkstra相似,每次选择的边是使树的总权重增加最小的边。
MST-PRIM(G, w, r)
for each u belongs to G.V
u:key = infinity
u:t = NILL
r:key = 0
Q = G.V
while Q != Empty
u = EXTRACT-MIN(Q)
for each v belongs to G.Adj[u]
if v belongs to Q and w(u,v) < v.key
v.t = u
v.key = w(u,v)
3.Shortest Paths
Dijkstra算法:
goals to solve single-source shortest path in weighted directed graph. It requires all weights are non-negative.
INITIALIZE-SINGLE-SOURCE(G,s)
for each vertex v belongs to G.V
v.d = infinity // record the upper bound of the shortest path weight from source vertex s to vertex v.
v.t = NIL
s.d = 0
// Time complexity: O(V)
RELAX(u,v,w)
if v.d > u.d + w(u,v)
v.d = u.d + w(u,v)
v.t = u
DIJKSTRA(G,w,s)
INITIALIZE-SINGLE-SOURCE(G,s)
S = Empty
Q = G.V
while Q != Empty
u = EXTRACT-MIN(Q)
S = S U {u}
for each vertex v belongs to G.Adj[u]
RELAX(u,v,w)
Bidirectional Dijkstra:
Search the route from beginning and ending at the same time. Speed up to twice.
Bellman-Ford:
weights of edge can be negative. Others are almost same as Dijkstra Algorithm.
A-Star Algorithm:
–Directed search can scan fewer vertices
–A* is a directed search algorithm based on Dijkstra and potential functions
–A* can also be bidirectional
–Euclidean distance is a potenital for a plane(road networks)
–landmarks can be used for good potential function, but we need preprocessing to use them.
Pattern matching
1 Herding pattern into Trie Tree:
Build a Trie Tree pattern which matching with data.
O(|Text|*|longestpattern|)
Brute Force approach: O(|Text|*|Patterns|)
Implement:
AC自动机:
1. build trie tree; 2. fail pointer;3.pattern matching
2 Herding Text into Suffix Trie:
build a suffix tree from text.
3 Suffix Trees
combine unique paths
4 Burrows-Wheeler Transform
used in Data compression