Algorithms

Programming Thoughts:

Reasoning thought; Recursive though; Greedy thought; Enumeration thought; Divid and Conquer; Backtracking; Dynamic Programming; Probability thought

Sorting:

Internal sort:All data have read into memory and only sort in memory.

Exchange sort: Bubble sort, Quick sort
Select sort: Select sort, Heap sort
Insert sort: Insert sort, Shell sort
Merge sort: Merge sort

External sort:Data is too big to read into memory once. Thus, process data part by part, finally, merge all parts of sorted data.

Linear List:

Order List和Linked List。

Queue:

A kind of Linear List.

Stack:

O(n)

Unsorted array and using QuickSort O(NlogN+logN);Sorted array O(logN)

O(1)。Data as much as possible to disperse;Hash function as simple as possible。

Five Hash approaches:

1、直接寻址。Y=X+A;

2、除法取余。Y=X%A;

3、数字分析。比如有一组value1=112233,value2=112633,value3=119033,针对这样的数我们分析数中间两个数比较波动,其他数不变。那么我们取key的值就可以是key1=22,key2=26,key3=90。

4、平方取中。

5、折叠:

比如value=135790,要求key是2位数的散列值。那么我们将value变为13+57+90=160,然后去掉高位“1”,此时key=60,哈哈,这就是他们的哈希关系,这样做的目的就是key与每一位value都相关,来做到“散列地址”尽可能分散的目地。

散列冲突解决:

1、开放寻址:将数组中未使用的地址开放给发生冲突的数据作为存储。

O(n/3+length)

(5) Binary Search Tree

插入、删除、查找:O(LogN)

(6) Bitmap

大数据的快速查找、判重和删除。也可通过去重压缩数据。爬虫系统的url去重。数据重复率大时复杂度增加。

1Byte = 8Bits。1Byte表示8个数。一个Int空间4Bytes,

a[0]–0-31; a[1]–32-63; a[2]–64-95。

数字存在时,对应的位置value为1,否则为0。int a[1+MAX/32]。MAX为最大的数。

Tree Structures:

(1) Binary Search Tree:

(2) Balanced Search Tree:

Root的左子树高度与右子树高度之差不超过1.

失衡的解决办法:旋转、添加、删除。

检索速度快。

(3) Treap Tree:

有两套数据:结点的Key和优先级value。

特点:
结点的key满足Binary search tree;
结点的优先级满足MiniHeap。

(4) Splay Tree:

AVL变体。根据旋转、伸展将树的结构调整。

(5) Trie Tree:

1.根节点不包含字符,除根结点外每个节点都只包含一个字符。

2.从根节点刀某个节点,将途径节点的字符连接起来就是该节点对应的字符串。

Graph:

1. Basics of Graph:

1.1 Representation of Graphs

Edge list

adjacency lists: Sparse Graphs.|E| is much less than |V|*|V| or |E| is close to |V|.

adjacency matrix: Dense Graphs.|E| is close to |V|*|V|.

To know whether two edges are connected quickly, Adjacency matric is a good way to represent.

1.2 BFS

Prim and Dijkstra algorithms use ideas similar to BFS.

BFS(G,s)
  for each vertex u belongs to G.V - {s}
      u.color = WHITE
      u.d = inifinity
      u.t = NIL
  s.color = GRAY
  s.d = 0
  s.t = NIL 
  Q = Empty
  ENQUEUE(Q,s) 
  while Q != Empty
      u = DEQUEUE(Q) 
      for each v belongs to G.Adj[u] 
         if  v.color == WHITE  
            v.color = GRAY
            v.d = u.d + 1 
            v.t = u 
            ENQUEUE(Q,v)
      u.color = black   

Time complexity: O(V+E)

1.4 DFS

DFS(G)
  for each vertex u belongs to G.V 
    u.color = WHITE 
    u.t = NIL
  time = 0
  for each vertex u belongs to G.V
    if u.color == WHITE 
       DFS-VISIT(G,u)

DFS-VISIT(G,u)
  time = time + 1 // white vertex u has just been discovered
  u.d = time
  u.color = GRAY
  for each v belongs to G.Adj[u] //explore edge(u)
     if v.color == WHITE  
        v.t = u
        DFS-VISIT(G,v)
  u.color = BLACK
  time = time + 1 
  u.f = time // blacken u; it is finished

Time complexity: O(V+E)

1.4 Topological Sort

use DFS to perform a topological sort of a directed acyclic graph.

TOPOLOGICAL-SORT(G)
1. call DFS(G) to compute finishing times v.f for each vertex v 
2. as each vertex is finished, insert it onto the front of a linked list
3. return  the linked list of vertices

O(V+E)

1.5 Strongly-connected components:**

vertice u –> v and v –> u; that is, vertices u and v are reachable from each other.

STRONGLY-CONNECTED-COMPONENTS(G)
1.call DFS(G) to compute finishing times u.f for each vertex u 
2.compute GT
3.call DFS(GT), but in the main loop of DFS, consider the vertices
   in order of decreasing u.f (as computed in line 1)
4.output the vertices of each tree in the depth-first forest formed in line 3 as a
separate strongly connected component

2. Minimum Spanning Tree:

Kruskal Algorithms :

根据weights为每条边排序。Find判断是否为环路。否,则用union连接两个点。

MST-KRUSKAL(G, w)
 A = Empty;
 for each vertex v belongs to G.V
    MAKE-SET(v)
 sort the edges of G:E into nondecreasing order by weight w

 for each edge(u,v) belongs to G.E, taken in nondecreasing order by weight
    if FIND-SET(u) != FIND-SET(v)
      A = A U {(u,v)} 
      UNION(u,v) 

Prim Algorithms:
与Dijkstra相似,每次选择的边是使树的总权重增加最小的边。

MST-PRIM(G, w, r)
 for each u belongs to G.V
   u:key = infinity
   u:t = NILL

 r:key = 0 
 Q = G.V 
 while Q != Empty
   u = EXTRACT-MIN(Q)
   for each v belongs to G.Adj[u] 
     if v belongs to Q and w(u,v) < v.key
       v.t = u
       v.key = w(u,v)

3.Shortest Paths

Dijkstra算法:
goals to solve single-source shortest path in weighted directed graph. It requires all weights are non-negative.

INITIALIZE-SINGLE-SOURCE(G,s)
  for each vertex v belongs to G.V
     v.d = infinity // record the upper bound of the shortest path weight from source vertex s to vertex v. 
     v.t = NIL
  s.d = 0 
// Time complexity: O(V)

RELAX(u,v,w)
  if v.d > u.d + w(u,v)
    v.d = u.d + w(u,v)
    v.t = u

DIJKSTRA(G,w,s)
  INITIALIZE-SINGLE-SOURCE(G,s)
  S = Empty
  Q = G.V
  while Q != Empty
     u = EXTRACT-MIN(Q)
     S = S U {u}
     for each vertex v belongs to G.Adj[u]
       RELAX(u,v,w)

Bidirectional Dijkstra:
Search the route from beginning and ending at the same time. Speed up to twice.

Bellman-Ford:
weights of edge can be negative. Others are almost same as Dijkstra Algorithm.

A-Star Algorithm:
–Directed search can scan fewer vertices

–A* is a directed search algorithm based on Dijkstra and potential functions

–A* can also be bidirectional

–Euclidean distance is a potenital for a plane(road networks)

–landmarks can be used for good potential function, but we need preprocessing to use them.

Pattern matching

1 Herding pattern into Trie Tree:

Build a Trie Tree pattern which matching with data.
image
image

O(|Text|*|longestpattern|)

Brute Force approach: O(|Text|*|Patterns|)

Implement:
AC自动机:
1. build trie tree; 2. fail pointer;3.pattern matching

2 Herding Text into Suffix Trie:

build a suffix tree from text.

3 Suffix Trees

combine unique paths

4 Burrows-Wheeler Transform

used in Data compression

5 Suffix array

6 KMP algorithm

# Machine learning algorithms A collection of minimal and clean implementations of machine learning algorithms. ### Why? This project is targeting people who want to learn internals of ml algorithms or implement them from scratch. The code is much easier to follow than the optimized libraries and easier to play with. All algorithms are implemented in Python, using numpy, scipy and autograd. ### Implemented: * [Deep learning (MLP, CNN, RNN, LSTM)](mla/neuralnet) * [Linear regression, logistic regression](mla/linear_models.py) * [Random Forests](mla/ensemble/random_forest.py) * [Support vector machine (SVM) with kernels (Linear, Poly, RBF)](mla/svm) * [K-Means](mla/kmeans.py) * [Gaussian Mixture Model](mla/gaussian_mixture.py) * [K-nearest neighbors](mla/knn.py) * [Naive bayes](mla/naive_bayes.py) * [Principal component analysis (PCA)](mla/pca.py) * [Factorization machines](mla/fm.py) * [Restricted Boltzmann machine (RBM)](mla/rbm.py) * [t-Distributed Stochastic Neighbor Embedding (t-SNE)](mla/tsne.py) * [Gradient Boosting trees (also known as GBDT, GBRT, GBM, XGBoost)](mla/ensemble/gbm.py) * [Reinforcement learning (Deep Q learning)](mla/rl) ### Installation cd MLAlgorithms pip install scipy numpy pip install . ### How to run examples without installation cd MLAlgorithms python -m examples.linear_models ### How to run examples within Docker cd MLAlgorithms docker build -t mlalgorithms . docker run --rm -it mlalgorithms bash python -m examples.linear_models ### Contributing Your contributions are always welcome! Feel free to improve existing code, documentation or implement new algorithm. Please open an issue to propose your changes if they big are enough.
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 Introduction to Information Theory . . . . . . . . . . . . . 3 2 Probability, Entropy, and Inference . . . . . . . . . . . . . . 22 3 More about Inference . . . . . . . . . . . . . . . . . . . . . 48 I Data Compression . . . . . . . . . . . . . . . . . . . . . . 65 4 The Source Coding Theorem . . . . . . . . . . . . . . . . . 67 5 Symbol Codes . . . . . . . . . . . . . . . . . . . . . . . . . 91 6 Stream Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7 Codes for Integers . . . . . . . . . . . . . . . . . . . . . . . 132 II Noisy-Channel Coding . . . . . . . . . . . . . . . . . . . . 137 8 Dependent Random Variables . . . . . . . . . . . . . . . . . 138 9 Communication over a Noisy Channel . . . . . . . . . . . . 146 10 The Noisy-Channel Coding Theorem . . . . . . . . . . . . . 162 11 Error-Correcting Codes and Real Channels . . . . . . . . . 177 III Further Topics in Information Theory . . . . . . . . . . . . . 191 12 Hash Codes: Codes for Ecient Information Retrieval . . 193 13 Binary Codes . . . . . . . . . . . . . . . . . . . . . . . . . 206 14 Very Good Linear Codes Exist . . . . . . . . . . . . . . . . 229 15 Further Exercises on Information Theory . . . . . . . . . . 233 16 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . 241 17 Communication over Constrained Noiseless Channels . . . 248 18 Crosswords and Codebreaking . . . . . . . . . . . . . . . . 260 19 Why have Sex? Information Acquisition and Evolution . . 269 IV Probabilities and Inference . . . . . . . . . . . . . . . . . . 281 20 An Example Inference Task: Clustering . . . . . . . . . . . 284 21 Exact Inference by Complete Enumeration . . . . . . . . . 293 22 Maximum Likelihood and Clustering . . . . . . . . . . . . . 300 23 Useful Probability Distributions . . . . . . . . . . . . . . . 311 24 Exact Marginalization . . . . . . . . . . . . . . . . . . . . . 319 25 Exact Marginalization in Trellises . . . . . . . . . . . . . . 324 26 Exact Marginalization in Graphs . . . . . . . . . . . . . . . 334 27 Laplace's Method . . . . . . . . . . . . . . . . . . . . . . . 341 28 Model Comparison and Occam's Razor . . . . . . . . . . . 343 29 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . 357 30 Ecient Monte Carlo Methods . . . . . . . . . . . . . . . . 387 31 Ising Models . . . . . . . . . . . . . . . . . . . . . . . . . . 400 32 Exact Monte Carlo Sampling . . . . . . . . . . . . . . . . . 413 33 Variational Methods . . . . . . . . . . . . . . . . . . . . . . 422 34 Independent Component Analysis and Latent Variable Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 35 Random Inference Topics . . . . . . . . . . . . . . . . . . . 445 36 Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . 451 37 Bayesian Inference and Sampling Theory . . . . . . . . . . 457 V Neural networks . . . . . . . . . . . . . . . . . . . . . . . . 467 38 Introduction to Neural Networks . . . . . . . . . . . . . . . 468 39 The Single Neuron as a Classi er . . . . . . . . . . . . . . . 471 40 Capacity of a Single Neuron . . . . . . . . . . . . . . . . . . 483 41 Learning as Inference . . . . . . . . . . . . . . . . . . . . . 492 42 Hop eld Networks . . . . . . . . . . . . . . . . . . . . . . . 505 43 Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . 522 44 Supervised Learning in Multilayer Networks . . . . . . . . . 527 45 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . 535 46 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . 549 VI Sparse Graph Codes . . . . . . . . . . . . . . . . . . . . . 555 47 Low-Density Parity-Check Codes . . . . . . . . . . . . . . 557 48 Convolutional Codes and Turbo Codes . . . . . . . . . . . . 574 49 Repeat{Accumulate Codes . . . . . . . . . . . . . . . . . . 582 50 Digital Fountain Codes . . . . . . . . . . . . . . . . . . . . 589 VII Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . 597 A Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 B Some Physics . . . . . . . . . . . . . . . . . . . . . . . . . . 601 C Some Mathematics . . . . . . . . . . . . . . . . . . . . . . . 605 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值