- 博客(294)
- 资源 (1)
- 收藏
- 关注
原创 Ngnix log to Elasticsearch
nginx-es.conf input { file { path => "/opt/logtest/nginx_access.log.1" start_position => "beginning" sincedb_path => "/opt/logstash-2.3.4/sincedb/" }...
2016-08-03 17:39:22
308
原创 Install Logstash And Sample Conf
1. Download #wget https://download.elastic.co/logstash/logstash/logstash-2.3.4.tar.gz#tar -xzf logstash-2.3.4.tar.gz#cd logstash-2.3.4#./bin/logstash-plugin install logstash-output-webhdfs...
2016-08-01 11:05:52
295
原创 大数据挖掘高质量博客
https://pkghosh.wordpress.com/2012/09/03/from-item-correlation-to-rating-prediction/ https://pkghosh.wordpress.com/?s=recommendation sifarishhttps://github.com/pranab/sifarish
2016-07-29 14:20:42
410
原创 Storm: monitor storm with supervisor
#yum install supervisor#vi /etc/supervisord.conf[program:storm-supervisor]command=/opt/apache-storm-0.9.3/bin/storm supervisoruser=rootautostart=trueautorestart=truestartsecs=10st...
2015-09-02 15:58:29
219
原创 Solr: 5.2.1 install and config
1. upload solr-5.2.1.tgz install_solr_service.sh to the same dir2.# install_solr_service.sh solr-5.2.1.tgz 3. #cd /var/solr/ #vi solr.in.shmodify solr's jvm configure#SOLR_HEAP="10...
2015-09-01 18:50:15
153
原创 Solr: index product and price for sellers and perfoming query and sorting
In my current project, the modle seller has multiply products with price, I want to index products and query them then sorting them by price , seller's credit ,the distance between the seller and ...
2015-08-25 16:58:53
139
原创 Top ML software
http://www.predictiveanalyticstoday.com/top-free-software-for-text-analysis-text-mining-text-analytics/
2015-08-05 15:02:02
175
原创 Curator: delay queue
curator http://curator.apache.org/curator-client/index.html
2015-08-03 16:15:07
133
原创 matlab install on ubuntu
http://blog.youkuaiyun.com/lanbing510/article/details/41698285
2015-07-10 13:59:05
140
原创 Solr: Using FunctionQuery in SOLR Sort Syntax
In my project, I got a similar problem likeshttp://stackoverflow.com/questions/27701533/using-functionquery-in-solr-sort-syntax I want to sort my documents by a custom score using function ...
2015-07-07 17:36:48
189
原创 Ubuntu: common errors
when run#sudo update-managererror:solution:sudo apt-get update && sudo apt-get dist-upgrade---------update firefox flash plugin#tar -xzf install_flash_player_11_linux.x86_6...
2015-07-07 09:53:15
140
原创 Solr: integrate carrot2 with solr-5.1.0
I already integrated carrot2 with solr-4.x with my customerized chinese tokenizer successfully.But I run some errors following my series of blogs http://ylzhj02.iteye.com/blog/2152348 to adopt ca...
2015-07-01 10:42:22
166
原创 Solr: Spatial Search
1. schema <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers"/&
2015-06-26 14:59:54
330
原创 Solr: Synonym Query
1. config schema.xml<fieldtype name="text_ch" class="solr.TextField"> <analyzer type="index"> <tokenizer class="org.lionsoul.jcseg.analyzer.JcsegTokenizerFactory" mode=&qu
2015-06-18 17:59:03
194
原创 Solr: Install solr to production
1. download solr-5.2.1.tgz2. install#tar xzf solr-5.2.1.tgz solr-5.2.1/bin/install_solr_service.sh --strip-components=2#./install_solr_service.sh solr-5.2.1.tgz 3. check solr status#servi...
2015-06-17 16:31:04
119
原创 SOLR: tika with OCR engine
I want to parse the content not just the metadata of a jpg picture. The following code is the test classimport java.io.File;import java.io.FileInputStream;import java.io.IOException;impo...
2015-06-12 15:03:35
459
原创 Solr: Install tesseract-ocr
Install dependency#tar -jxzf leptonica-1.69.tar.bz2#cd leptonica-1.69#./configure#make -j4#sudo make install-------------------------- download tesseract-ocr-3.02.02.tar.gz #tar -xzf t...
2015-06-11 16:35:45
142
原创 用 Apache Tika 理解信息内容
www.ibm.com/developerworks/cn/opensource/tutorials/os-apache-tika/ http://www.tutorialspoint.com/tika/tika_quick_guide.htm
2015-06-09 16:53:20
118
原创 Android: 信息推送
Preferenceshttp://www.cnblogs.com/hanyonglu/archive/2012/03/04/2378971.html
2015-06-08 16:58:08
107
原创 Neo4j: Create multiple relationships between the same two nodes
In my case, I want to build a addreebook in neo4j, which a person has mutiply cellphones and maybe some cellphones have the same concacter with same phone number but different nicknames. such asus...
2015-06-03 14:40:54
224
原创 Jubatus: Setup in Distributed Mode
Referenceshttp://jubat.us/en/tutorial_distributed.htmlhttp://jubat.us/en/admin.html
2015-05-28 14:17:00
110
原创 Jubatus: Classify Example
1.create a mvn project with pom.xml<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0...
2015-05-28 10:26:30
171
原创 Jubatus: Realtime online ML Introduction
http://jubat.us/en/overview.html
2015-05-26 14:24:47
138
原创 Neo4j: Remote Restful API (java)
#git clone https://github.com/neo4j-contrib/java-rest-binding.git#git tag -l#git checkout neo4j-rest-graphdb-2.0.1#mvn clean install In mvn project's pom.xmladd<dependency> <...
2015-05-25 14:29:01
484
原创 Solr: Using solrJ to operate solr
Referenceshttp://www.solrtutorial.com/solrj-tutorial.htmlhttps://cwiki.apache.org/confluence/display/solr/Using+SolrJ
2015-05-22 13:29:12
104
原创 Flume: morphline sink with solr 5.1.0
1. down flume 1.5.2 source code and change solr version to 5.1.0 2. compile and install3. cp solr 4.10.1 related jars to lib dir to sove this errorCloudSolrServer' (current frame, stack[2])...
2015-05-21 16:38:37
188
原创 Strom: Trident Fields and tuples
https://storm.apache.org/documentation/Trident-tutorial.html The Trident data model is the TridentTuple which is a named list of values. During a topology, tuples are incrementally built up throu...
2015-04-28 10:14:54
115
原创 HighQulity PPT on line
http://www.slideshare.net/yuhuang/large-scale-machine-learning-for-big-data
2015-04-24 15:33:21
110
原创 Spark: Spark Streaming
Spark Streaming uses a “micro-batch” architecture, where the streaming computation is treated as a continuous series of batch computations on small batches of data. Spark Streaming receives data fro...
2015-04-22 16:02:40
141
原创 Spark: cluters architecture
In distributed mode, Spark uses a master/slave architecture with one central coordinator and many distributed workers. The central coordinator is called the driver.The driver communicates with a p...
2015-04-22 10:51:33
150
原创 Spark: deploy cluster in standlone mode
Host: 192.168.0.135 192.168.0.136 192.168.0.137master: 137 workers:135 136 1.Install spark on all hosts in /opt dir 2.Install SSH Remote Access137#ssh-keygen137#ssh-copy-id -i ~/.s...
2015-04-20 12:32:56
127
原创 Spark: Cluster Mode Overview
https://spark.apache.org/docs/latest/cluster-overview.html This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved. Read throug...
2015-04-20 10:15:03
133
原创 Flume: avro source and sink
In order to flow the data across multiple agents or hops, the sink of the previous agent and source of the current hop need to be avro type with the sink pointing to the hostname (or IP address) and ...
2015-04-17 11:12:42
119
原创 Flume: hbase sink
flume.confa1.sinks.hbase-sink1.channel = ch1a1.sinks.hbase-sink1.type = hbasea1.sinks.hbase-sink1.table = usersa1.sinks.hbase-sink1.columnFamily= infoa1.sinks.hbase-sink1.serializer=org.ap...
2015-04-16 17:04:38
236
原创 Kite:Morphlines Introduction
http://kitesdk.org/docs/1.0.0/morphlines/http://blog.cloudera.com/blog/2013/07/morphlines-the-easy-way-to-build-and-integrate-etl-apps-for-apache-hadoop/
2015-04-13 11:09:08
209
原创 Neo4j: fulltext search
Model @Indexed(indexType = IndexType.FULLTEXT, indexName = "TaskTile") private String title; Repository @Query("START n=node:TaskTile({0}) return n") Iterable<Task> fin...
2015-04-08 15:03:53
370
hadoop in action
2014-11-24
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人