- 博客(71)
- 资源 (1)
- 收藏
- 关注
原创 kudu - impala
partition nums equal to num of cores in clusterkudu optimizes sql if =, <=, '\<', '\>', >=, BETWEEN, or IN used, but not for !=, LIKE, or any other predicate type
2018-08-12 15:54:49
297
转载 cassandra, hbase and mongodb
cassandra, AP system, weak consistency, heavy write, high availibility, good for online use hbase, CP system, good support on batch analytics, good for analytics, not typical for online use mongodb,...
2018-07-27 11:36:58
282
翻译 Java offheap memory
- MappedByteBufferpublic void copyFile(String filename,String srcpath,String destpath)throws IOException { File source = new File(srcpath+"/"+filename); File dest = new File(destpath+"/...
2018-07-18 09:11:15
371
原创 kafka
- user caselog collectionmessage systemuser activitystream processingevent source- designkafka broker leader, multiple brokers contend for being leader by creating ephemeral node in zookeeper. only on...
2018-07-16 17:40:38
256
转载 JVM trouble shooting
- JPS, TOP and JSTACK, jps to find java info, like classname, parameters of main, JVM arguments, pid, jps -m -ltop to find the most CPU-bound thread, top -Hp pidjstack to dump stacks of thread, jstac...
2018-07-10 18:01:33
250
原创 review list
devopsspringboot and microservicesgmlparser, design patternpersistable queue, java volatile, atomacity and concurrency, mockitovolatile is not atomic, happen-before, happen-after for memory visibilit...
2018-06-25 11:58:10
353
翻译 Software Design
- Design PrinciplesOpen-close, open for extension, close for modificationLiskov substitution, any subclass can be in the place where base class isDemeter, least known principleinterface segregation, p...
2018-06-01 12:19:49
369
原创 submit spark code to yarn
- configure spark to submit code to remote yarn val sparkConf = new SparkConf().setAppName(s"Bulk Import $manualNbr").setMaster("yarn").set("deploy-mode", "client")// ...
2018-05-27 16:11:37
283
转载 compile spark source code
Change scala version to the scala version in your machine: ./dev/change-scala-version.sh <version>Shutdown zinc: ./build/zinc-<version>/bin/zinc -shutdownCompile Spark: ./build/mvn -Pyarn ...
2018-05-25 18:12:01
221
翻译 LSM Log-Structured Merge-Tree
- Sequential access is better than random access -> WAL, append update to log- Memstore in memory for quick lookup -> Memstore which flushes data to store file when reaches valve- Merge multiple...
2018-05-12 19:43:51
162
翻译 B tree vs B+ tree
- B tree (key+data in every node), O(log(d)(n))d is degree of treeh is height of tree, h<= log(d)((n+1)/2)non-leaf node has n-1 key and n pointers, d<=n<=2dheights of each leaf are samenodes...
2018-05-12 19:03:07
286
翻译 HBase MapReduce
- Data Locality, block placement policy. the first copy is written to the data node where region server runs.- TableInputFormat, divide table at region boundaries by start row and end rowstatic class ...
2018-05-12 15:54:01
145
翻译 HBase Filters, Counters &amp; Coprocessors
- Filter -> FilterBase. setFilter(filter) method on Get and Scan- CompareFilter, operator + comparator , matched data is keptCompareFilter(CompareOp valueCompareOp, WritableByteArrayComparable valu...
2018-05-12 12:02:29
160
翻译 HBase Region Split
- Split Policy (ConstantSizeRegionSplitPolicy, IncreasingToUpperBoundRegionSplitPolicy, SteppingSplitPolicy)- Split Point, The first row of center block of the biggest file of the store- Split Workflo...
2018-05-09 17:55:27
181
翻译 HBase Concept
- Data Model, sparse, distributed, persisted multidimensional sorted map(row:string, column:string, time:int64) -> string //both key and value are uninterpreted bytesRowsingle row read and update i...
2018-05-08 21:23:51
164
翻译 Java GC
young generation and old generation. 1 eden and 2 survivor spaces.minor GC, mark and copy, from eden and one survivor to the other survivorfull GC, mark, sweep and compact generationsboth will stop th...
2018-05-07 17:55:39
168
翻译 bloom filter
- space efficient look up for fixed number of static elements. - may have, definitely no haven: number of elementsk: number of hash functions, k = n*ln2/mm: number of bits, >= n*lg(1/E)*lgeE: expec...
2018-05-07 13:07:58
137
翻译 spark - Running on Cluster
- package spark app (maven)<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.3</version>
2018-05-05 09:21:14
242
翻译 spark - Tuning and Debugging Spark
- submit application (sparkconf object cannot be changed after SparkContext creationmethod 1bin/spark-submit \—class com.example.MyApp \—master local[4] \—name “My Spark App” \—conf spark.ui.port=...
2018-05-04 18:42:03
175
翻译 spark - Advanced Spark Programming
- Accumulatorval blankLines = new LongAccumulatorsc.register(blankLines)put accumulate in transformation for debugging purpose because of speculative task. it's not accurate. But in action, the accum...
2018-05-03 20:04:33
236
翻译 spark - Loading and Saving Data
- File FormatsText Filesc.textFile, load a text filesc.wholeTextFiles, load multiple files (filename, entire content) under specified dirJSONsc.textFile.map to JSON object (people.add(mapper.readValue...
2018-05-03 18:03:54
149
翻译 scala notes (7) - Advanced Type and Implicit
- advanced typessingleton typedef setTitle(title: String): this.type = { ...; this } // for subtypesdef set(obj: Title.type): this.type = { useNextArgAs = obj; this } //take object as parameter, no ...
2018-04-29 22:48:12
146
翻译 scala notes (6) - Annotation, Future and Type Parameter
- Annotationclass MyContainer[@specialized T]def country: String @Localized@Test(timeout = 0, expected = classOf[org.junit.Test.None])def testSomeFeature() { ... }Java annotation can be mixed with Sc...
2018-04-27 15:34:52
150
翻译 spark - Pair RDD (Key/Value Pairs)
- Create Pair RDDfrom regular RDD by calling map function.val pairs = lines.map(x => (x.split(" ")(0), x))transformation on Pair RDD (data: {(1,2),(3,4),(3,6)})reduceByKey => {(1,2), (3,10)}grou...
2018-04-27 10:24:24
396
翻译 scala notes (5) - pattern and case class
- Pattern and Case Class ch match{ case _ if Character.isDigit(ch) => .. case '+' => ... case _ => ...}prefix match { case "0" | "0x" | "0X" => ...}case variable should be lowercase....
2018-04-26 12:08:49
121
翻译 scala notes (4) - collection
- CollectionArray is equivalent of Java array, it's mutable in terms of value update. but not sizesequenceVector is immutable equivalent of ArrayBuffer which is indexed sequence with fast random acces...
2018-04-25 18:04:21
136
翻译 scala notes (3) - Files &amp; Regular Expression, Trait, Operation and Function
- Files & Regular Expressionsread from file, url and string, remember to close sourceval source = Source.fromFile("myfile.txt", "UTF-8")val source1 = Source.fromURL("http://horstmann.com", "UTF-8...
2018-04-25 11:14:26
155
翻译 scala notes (2) - Class, Object, Package &amp;amp; Import and Inheritance
- Classclass Counter { private var value = 0 // You must initialize the field, otherwise it's abstract class. def increment() { value += 1 } // Methods are public by default def current() ...
2018-04-24 19:05:24
192
翻译 scala notes (1) - Basic, Control & Function, Array and Map & Tuple
- Basicsval greeting: String = nullval xma, ymax = 100 // both are setString -> StringOps //intersect, sorted...Int -> RichInt // 1.to(10)primitive -> Rich*BigInt & BigDecimal // * can be...
2018-04-24 12:02:34
136
翻译 Programming with RDD
- Passing functions to Spark (be careful the reference to the containing object which need to be serializable)class SearchFunctions(val query: String) {def isMatch(s: String): Boolean = {s.contains(...
2018-04-23 18:51:18
105
翻译 scala type parameters
- type bounds class Pair[T <: Comparable[T]](val first: T, val second: T) {def smaller = if (first.compareTo(second) < 0) first else second //compareTo}class Pair[T](val first: T, val seco
2018-04-23 18:23:37
691
转载 MapReduce Features
- Counters (values are definitive only once job has successfully completed)Task CountersFilesystem CountersJob Counters (only in application master. doesn't need to send across network, mainly about t...
2018-04-22 19:52:21
95
翻译 MapReduce Types and Formats
- typesmap: (K1, V1) → list(K2, V2)combiner: (K2, list(V2)) → list(K2, V2)reduce: (K2, list(V2)) → list(K3, V3)- partition (HashPartitioner)public abstract class Partitioner<KEY, VALUE> {public ...
2018-04-21 19:45:47
109
翻译 MapReduce Workflow
check output foldercalculate splitsapplication master gets progress and completion reports from tasks. it also requests containers for map tasks and reduce tasks. it starts container by the nodemanage...
2018-04-21 16:13:32
337
翻译 MapReduce Application
- Configurationconf.addDefaultResource, conf.addResource, configuration overridden <property><name>fs.defaultFS</name><value>file:/// or hdfs://namenode</value></pr...
2018-04-21 11:22:59
260
翻译 Hadoop I/O
- checksum, CRC-32C, for every 512 bits, write, last datanode of the pipeline verifies checksumread, block verification on client readrawlocalfilesystem, to disable checksum- compression, (default is ...
2018-04-20 15:11:40
120
翻译 YARN (Yet Another Resource Negotiator) - Cluster Manager
- what is yarn- Yarn application run- Resources requestall requests up front (Spark) or dynamic request (MapReduce, mapper tasks requests are up front, but reduce tasks are dynamic)- application lifes...
2018-04-19 17:24:24
265
翻译 HDFS
- suitable very large size, terabyte, petabyte write once and read many times handle node failure without noticeable interruption- not suitable for some applications with, low-latency data access, HBa...
2018-04-19 14:51:12
263
原创 Map
HashMap get containsKey next o(1) o(1) o(h/n)Map key to array index to get complexity to O(1) (constant time).resize when table size >= threshold (= table size * load fact...
2018-04-19 13:52:36
213
1
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人