Anonymous_cx-优快云博客

原创 SparkSQL写数据到Hive的动态分区表

object HiveTableHelper extends Logging { def hiveTableInit(sc: SparkContext): HiveContext = { val sqlContext = new HiveContext(sc) sqlContext } def writePartitionTable(HCtx: HiveContext, in

2017-05-23 21:06:01 12993 4

原创 Best Time to Buy and Sell Stock

Question:Say you have an array for which the ith element is the price of a given stock on day i.If you were only permitted to complete at most one transaction (ie, buy one and sell one share of the sto

2016-09-21 14:47:22 753

原创 Valid Anagram

Question:Given two strings s and t, write a function to determine if t is an anagram of s.For example, s = “anagram”, t = “nagaram”, return true. s = “rat”, t = “car”, return false.Note: You may ass

2016-09-17 11:00:25 450

原创 House Robber

Question:You are a professional robber planning to rob houses along a street. Each house has a certain amount of money stashed, the only constraint stopping you from robbing each of them is that adjace

2016-09-16 21:44:08 458

原创 Move Zeroes

Question：Given an array nums, write a function to move all 0’s to the end of it while maintaining the relative order of the non-zero elements.For example, given nums = [0, 1, 0, 3, 12], after calling y

2016-09-13 19:12:06 476

原创 Invert Binary Tree

Question: 4 / \ 2 7 / \ / \1 3 6 9to 4 / \ 7 2 / \ / \9 6 3 1code:class TreeNode { int key; String data; TreeNode left; TreeNode right;

2016-09-13 09:20:38 382

原创 Reverse Words in a String

Given an input string, reverse the string word by word.For example, Given s = "the sky is blue", return "blue is sky the".code：public class ReverseString { public static void main(String[] args) {

2016-09-11 12:50:49 434

原创 Python 对文本先按词频统计，若相同按字典排序，后取TopN

Python Code:def count_words(s, n): dic = {} words = s.split(" ") for word in words: dic[word] = words.count(word) wordslist = sorted(dic.items(), key=lambda kv: (-kv[1], kv[0]))

2016-09-10 16:16:03 2708

转载 Spark内存管理模型

Spark是现在很流行的一个基于内存的分布式计算框架，既然是基于内存，那么自然而然的，内存的管理就是Spark存储管理的重中之重了。那么，Spark究竟采用什么样的内存管理模型呢？本文就为大家揭开Spark内存管理模型的神秘面纱。我们在《Spark源码分析之七：Task运行（一）》一文中曾经提到过，在Task被传递到Executor上去执行时，在为其分配的TaskRunne

2016-08-15 21:37:55 3260

原创用Flume采集多台机器上的多种日志并存储于HDFS

需求：把A、B 机器中的access.log、ugcheader.log、ugctail.log 汇总到C机器上然后统一收集到hdfs中。 IP: A:155 B:156 C：162 但是在hdfs中要求的目录为： /source/access/20160101/** /source/ugcheader/20160101/** /source/ugctail/20160101/**结构

2016-07-02 14:57:29 5727 1

原创个性化推荐项目架构

离线与实时模块架构

2016-06-27 21:49:51 1155

原创 Scala之隐式转换

大家都知道：1 to 10等价于1.to(10)。其实，Scala 中的Int 类型是没有to这个方法的，那为什么还能调用to方法？其实是编译器偷偷地帮我们完成Int->RichInt的转换，这个就是隐式转换。隐式转换可以丰富现有类库的功能。在REPL中键入：implicits -v查看编译器默认导入的隐式转换。笔者使用的Scala版本为2.11.8。第一个隐式转换的例子//从设计模式的

2016-06-05 16:17:50 999

原创 kafka和flume整合

Kafka作为source：配置文件：**#定义各个模块**a1.sources = kafka a1.sinks = loga1.channels = c1#配置kafka source#source的类型为kafkaSourcea1.sources.kafka.type = org.apache.flume.source.kafka.KafkaSource#消费者连接的zk集群地址

2016-05-19 20:47:09 1518

原创 Scala之高阶函数

高阶函数：把函数传给函数第一个高阶函数：def formatResult(name:String,n:Int,f:Int=>Int)={ val msg="The %s of %d is %d." msg.format(name,n,f(n))}formatResult是一个高阶函数，他接受一个函数f为参数，参数的类型是Int=>Int，表示接受整型并返回一个整型结果。多态函数

2016-05-07 11:09:31 1045

转载 Spark RDD API详解Map和Reduce

原始链接：https://www.zybuluo.com/jewes/note/35032 RDD是什么？ RDD是Spark中的抽象数据结构类型，任何数据在Spark中都被表示为RDD。从编程的角度来看，RDD可以简单看成是一个数组。和普通数组的区别是，RDD中的数据是分区存储的，这样不同分区的数据就可以分布在不同的机器上，同时可以被并行处理。因此，Spark应用程序所做的无非是把需要处理的

2016-05-02 11:25:06 880

原创 Shell之网站日志的PV,UV计算

什么是PV,UV?PV:page view。即页面访问量 UV:unique visitor。即独立访客量首先，more一下access.log这个文件，看一下文件的结构: 我们只需要第一列的数据，所以： more clean.log计算PV:计算UV:首先，我们需要对IP地址进行去重，然后计算行数。统计访问次数在前十位的IP地址：

2016-04-23 22:59:46 1464

原创 Strom优化

并行度worker为storm提供工作进程，程序的并行度可以设置（包括spout和bolt的并行度，如果有acker的话还包括acker的并行度），并行度即为task 数目。一般而言 worker和task之间的比例,即1个worker包含10~15个左右,当然根据配置和应用需要测试优化。worker(slots)CPU 16核，建议配置20个worker。CPU 24或32核，30个work

2016-04-17 11:09:40 816

原创 Storm集群启动与停止脚本及其注意事项

前提：集群之间首先要做免密码登陆集群中主节点启动：nimbus和ui进程从节点启动：supervisor和logviewer进程1：启动集群需要先设置从节点的ip，保存到文件supervisorhost中，一行是一个节点的ip 192.168.1.171 192.168.1.172在主节点写一个脚本：start-all.sh#!/bin/bash s

2016-04-17 11:02:39 3829

原创 Storm入门程序——WordCount

spout：WordReader：package spout;import java.io.File;import java.io.IOException;import java.util.Collection;import java.util.List;import java.util.Map;import org.apache.commons.io.FileUtils;import b

2016-04-17 10:20:38 920

原创 Storm入门

*1. 什么是StormStorm是Twitter开源的一个分布式的实时计算系统。2. Storm的设计思想 - Storm是对流Stream的抽象，流是一个不间断的无界的连续tuple，注意Storm在建模事件流时，把流中的事件抽象为tuple即元组。 - Storm将流中元素抽象为Tuple，一个tuple就是一个值列表value list，list中的每个value都有一个na

2016-04-17 09:58:31 4848

原创自定义的ArrayList

//MyList接口import java.util.Collection;public interface MyList<T>{ boolean add(T t); boolean addAll(Collection<? extends T> c); boolean remove(T t); Object get(int index); boolean is

2016-04-04 16:41:11 521

原创 java中死锁的案例

//小明和小丽两人争夺水壶和水杯喝水。 //最后各执水壶或水杯，都喝不到水，造成死锁。public class DeadLock { //水壶 private Object object1=new Object(); //水杯 private Object object2=new Object(); public static void main(String

2016-03-29 22:01:16 522

原创 Shell（二）

find:find . -type d -print #打印目录 find . ! -name “*.txt” -print #打印不以.txt结尾的文件 find . -type f “*.php” ! -perm 644 -print #打印权限不是644的php文件 find . -type f -name “*.php” -perm 644 -print find . -type f

2016-03-19 21:15:23 479

原创 Shell(一)

1.2终端打印在默认情况下，echo在每次调用后会添加一个换行符。如果你想打印 ! ，那就不要将其放入双引号中，或者加上转义字符。 echo Hello World !或 echo 'Hello World !'或 echo "Hello world \!" 每种方法的副作用如下：使用不带引号的echo时，没法显示分号（;），因

2016-03-18 22:23:45 495

原创运行shell脚本的两种方式

一种是将脚本作为sh的命令行参数，另一种是将脚本作为具有执行权限的可执行文件。将脚本作为命令行参数时的运行方式如下： $ sh script.sh #假设脚本位于当前目录下或者 $ sh /home/path/script.sh #使用script.sh的完整路径如果将脚本作为sh命令行参数来运行，那么脚本中的#！/bin/bash(she

2016-02-17 15:44:50 4095

原创解决虚拟机SSH失败或ifconfig无IP

1、将vm 改成nat 连接模式。2、查看宿主机器，vm的服务启了没3、重启你的vm中的linux

2016-02-16 15:17:05 1801

Chenx