- 博客(247)
- 收藏
- 关注
原创 svm-overview
svm支持向量机的原理(转)支持向量机SVM(一)Stanford机器学习---第八讲. 支持向量机SVM手把手教你实现SVM算法(一)https://www.zhihu.com/question/21094489 ...
2017-09-17 14:50:24
274
原创 PIC correct errors
https://book.douban.com/people/eaglex/annotation/3056375/lib for svmhttps://www.csie.ntu.edu.tw/~cjlin/libsvm/
2017-09-07 17:37:02
380
原创 random variable distribution
https://www.zybang.com/question/b867b0b455406ce15e84abf27553a9cf.html
2017-09-04 17:46:10
315
原创 radial-basis function
ref:径向基核函数 (Radial Basis Function)–RBF径向基(Radial basis function)神经网络、核函数的一些理解7 核函数(Kernels)https://en.wikipedia.org/wiki/Radial_basis_functionhttp://www.blogjava.net/zhenandaci/...
2017-09-04 16:59:03
1259
原创 math-dot product and vector product
ref:https://wenku.baidu.com/view/cdd78d48a58da0116d17498f.html
2017-08-23 16:44:56
226
hbase write flow(byte level)
here is a byte flow of mutationlevelformatusagetop(abstract,user facing)[Put,Put…]HTable#put(list)encapsulation[HLogKey,WALEdit] WALEdit:kv1,kv2 || v[t...
2017-08-14 17:04:18
128
hadoop-replication written flow
w:net writer :net read(correspond to net write)wl:write locally,ie. fs writecost time:t4-t0=1 client w+2 DN w+ 1 DN wl~3w + 1wl ~ 1wl (assume disk write is bottle neck)
2017-08-14 17:00:54
127
原创 the fundation of information theory
信息论基础各章参考答案北邮信息论2006年期中试题答案标准A卷信息论-姜丹信息习题信息论xxx32.14------------北邮信息论课件2------------例题:信息论基础教程 第二章 (李亦农 李梅 著) 北京邮电大学出版社 ...
2017-05-09 23:26:04
349
原创 algorithms design techniques and analysis
算法设计技巧与分析答案算法设计技巧与分析-答案 任课教师贺全兵(计科系
2017-05-09 23:23:37
484
原创 algorithms abstract
todo ref:算法的时间复杂度和空间复杂度-总结为什么见周围人描述算法复杂度都用大 O 符号而不是大 Θ?
2017-05-07 17:50:30
127
spark-broadcast in spark
go through this block codes below,we will figure out some conclusions:val barr1 = sc.broadcast(arr1) //-broadcast a array with 1M int elements //-this is a embedded broadcast wrapped b...
2016-12-22 15:54:25
198
spark-storage/memory used in spark
access pattern in spark storage [1]到目前为止,我们已经了解了spark怎么使用JVM的内存以及集群上执行槽是什么,目前为止还没有谈到task的一些细节,这将在另一个文章中提高,基本上就是spark的一个工作单元,作为exector的jvm进程中的一个线程执行,这也是为什么spark的job启动时间快的原因,在jvm中启...
2016-12-12 16:31:20
420
原创 spark-hive on spark
总体设计Hive on Spark总体的设计思路是,尽可能重用Hive逻辑层面的功能;从生成物理计划开始,提供一整套针对Spark的实现,比如SparkCompiler、SparkTask等,这样Hive的查询就可以作为Spark的任务来执行了。以下是几点主要的设计原则。尽可能减少对Hive原有代码的修改。这是和之前的Shark设计思路最大的不同。Shark对Hive的改动太大...
2016-12-06 15:04:03
176
原创 spark-RDD vs DataFrame vs DataSet
In summation, the choice of when to use RDD or DataFrame and/or Dataset seems obvious. While the former offers you low-level functionality and control, the latter allows custom view and structure...
2016-11-29 15:38:24
150
原创 [spark-src-core] 8. trivial bug in spark standalone executor assignment
yep from [1] we know that spark will divide jobs into two steps to be executed:a.launches executors and b.assigns tasks to that executors by driver.so how do executors are assigned to workers ...
2016-11-22 17:24:48
124
[spark-src-core] 7.1 application in spark-PageRank
below code path are all from sparks' example beside some comments are added by me. val lines = ctx.textFile(args(0), 1) //-1 generate links of <src,targets> pair var links = li...
2016-11-03 15:59:12
144
原创 [spark-src-core] 6. checkpoint in spark
same as others big data technology,CheckPoint is a well-knowed solution to keep data a snapshot for speeduping failovers,ie. restores to most recent checkpoint state of data ,so u will not need t...
2016-10-19 17:14:46
145
[spark-src-core] 5.big data techniques in spark
there are several nice techniques in spark,eg. in user api side.here will dive into it check how does spark implement them. 1.abstract(functions in RDD)groupfunctionfeature principl...
2016-10-12 17:48:38
110
[spark-src-core] 4.2 communications b/t certain kernal components
there are several component entities run as daemons in spark(standalone),know to what/how they are working is necessary indeed. akka msg flow similar to tcp note:register driver =R...
2016-09-27 12:26:41
129
[spark-src-core] 4.1 spark on yarn
as the officials statements,spark is a computation framework,ie u can use it anywhere on which supplys a platform (eg yarn ,mesos) to run . so in this cluster manager,the all spark's daemons ar...
2016-09-27 12:16:42
160
[spark-src-core] 3.3 run spark in standalone(cluster) mode
simiar to the prevous article,this one is focused on cluster mode.1.issue command./bin/spark-submit --class org.apache.spark.examples.JavaWordCount --deploy-mode cluster --master spark://gzsw...
2016-09-19 12:30:17
298
[spark-src-core] 3.2.run spark in standalone(client) mode
1.startup command./bin/spark-submit --class org.apache.spark.examples.JavaWordCount --deploy-mode client --master spark://gzsw-02:7077 lib/spark-examples-1.4.1-hadoop2.4.0.jar hdfs://host02:/user...
2016-09-19 11:55:38
142
[spark-src-core] 3.run spark in cluster(local) mode
yep ,just the same with your guess,there are many deploy modes in spark,eg standalone,yarn,mesos etc.go advance step,the standalone mode can be devided into standalone,cluster(local) mode.the form...
2016-09-02 17:53:54
224
[spark-src-core] 2.5 core concepts in Spark
1.overview in wordcount-memory tips:Job > Stage > Rdd > DependencyRDDs are linked by Dependencies. 2.terms-RDD is associated by Dependency,ie Dependency is a warpper of RDD....
2016-08-25 17:38:41
117
[spark-src-core] 2.4 communications b/t certain kernal components
1 data flow overview note:-arrow here is means by:bold line is as data line ‘w/o sender and recevier meanings’ but only with data ‘from-to’-two ways to retieve task result:direct result and i...
2016-08-25 17:36:14
161
[spark-src-core] 2.3 shuffle in spark
1.flow1.1 shuffle abstract 1.2 shuffle flow 1.3 sort flow in shuffle 1.4 data structure in mem 2.core code paths //SortShuffleWriteroverride def write(records: Iterat...
2016-08-25 16:31:09
139
[spark-src-core] 2.2 job submitted flow for local mode-part II
in this section,we will verify that how does spark collect data from prevous stage to next stage(result task) figure after finishing ShuffleMapTask computation(ie post process ).note:the l...
2016-08-25 11:23:42
195
原创 [spark-src-core] 2.2 job submitted flow for local mode-part I
now we will dive into spark internal as per this simple example(wordcount,later articles will reference this one by default) belowsparkConf.setMaster("local[2]") //-local[*] by default//leib-c...
2016-08-24 17:36:23
178
原创 [spark-src-core] 2.1 relationships b/t misc spark shells
similar to other open source projects,spark has several shells are listed theresbinserver side shells start-all.shstart the whole spark daemons(ie. start-master.sh,start-slav...
2016-06-01 16:01:36
131
原创 scala- Scala对象比较==、eq、ne与java==、equals()
如果你想比较一下看看两个对象是否相等,可以使用或者==,或它的反义 !=。(对所有对象都适用,而不仅仅是基本数据类型)?1234scala> 1 == 2res24: Boolean = falsescala> 1 != 2res25: Boolean = true这些操作对所有...
2016-04-22 15:08:50
265
[spark-src-core] given SPARK_PRINT_LAUNCH_COMMAND to output more details
with enabling both system environment 'SPARK_PRINT_LAUNCH_COMMAND' and --verbose ,the spark command is more detailed that outputed from spark-submit.sh: hadoop@GZsw04:~/spark/spark-1.4.1-bin-hado...
2016-04-19 12:19:13
193
原创 scala- type conversion( classOf ,asInstanceOf,isInstanceOf)
ref :scala object 转Class Scala强制类型转换
2016-04-14 15:28:50
162
[spark-src] 1-overview
what is "Apache Spark™ is a fast and general engine for large-scale data processing....Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk." stated in apache spa...
2016-03-20 16:20:31
159
[spark-src]-source reading
base on : spark-1.4.1 hadoop-2.5.2 Base from simpleness to complexity and working flow principle,we conform to these steps:1.[spark-src] spark overview2.[spark-src] core from ...
2016-03-20 15:06:25
134
free talk-intelligent period prediction is undergoing
google AlphaGo vs Lee on 'the game of go' VS 回广州了,再战江湖cheers
2016-03-16 10:15:15
95
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人