自定义博客皮肤VIP专享

*博客头图：

点击选择上传的图片

格式为PNG、JPG，宽度*高度大于1920*100像素，不超过2MB，主视觉建议放在右侧，请参照线上博客头图

请上传大于1920*100像素的图片！

博客底图：

点击选择上传的图片

图片格式为PNG、JPG，不超过1MB，可上下左右平铺至整个背景

栏目图：

点击选择上传的图片

图片格式为PNG、JPG，图片宽度*高度为300*38像素，不超过0.5MB

主标题颜色：

RGB颜色，例如：#AFAFAF

Hover：

RGB颜色，例如：#AFAFAF

副标题颜色：

RGB颜色，例如：#AFAFAF

预览取消提交

自定义博客皮肤

-+

上一步保存

cc_jjj的博客

原创 stratascratch5 Popularity Percentage

Popularity PercentageFind the popularity percentage for each user on Facebook. The popularity percentage is defined as the total number of friends the user has divided by the total number of users on the platform, then converted into a percentage by mul.

2021-12-14 15:37:13 1239 1

原创 stratascratch 4 Finding User Purchases

Write a query that'll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.--建表use strata;drop tab.

2021-12-13 14:39:19 1459

原创 stratascratch2-Acceptance Rate By Date

What is the overall friend acceptance rate by date? Your output should have the rate of acceptances by the date the request was sent. Order by the earliest date to latest. Assume that each friend request starts by a user sending (i.e., user_id_sender) a f.

2021-12-13 13:50:32 348

原创 stratascratch1-Salaries Differences

stratascratch-Salaries Differences

2021-12-13 11:08:43 1778

原创 hive面试题--连续性问题

背景面试和工作中经常遇到，用sql/hive求出连续N天登录或者连续N天销售额破万的记录等。数据准备--新建表与导入数据use test;create table if not exists log_info(uid string,log_date string)row format delimited fields terminated by ',';insert into table log_info values('A','20210901'),

2021-11-02 16:22:12 1589

转载 hive sql笔试题

来源于某多社招。原文：https://blog.youkuaiyun.com/qq_24206673/article/details/108282465题目：有一张表记录了一场篮球赛的得分情况，主要有以下字段：队名（team）、队员名（name）、队员号（num）、得分（score）、得分时间秒级（score_time）。要求用sql/hive 连续三次为本队得分的球员；为本队比分反超的球员，以及对应的完成反超的时刻思路整理：拆分问题(1)，连续三次得分转换为：将得分表按照升序

2021-11-01 16:55:45 1012 3

原创 pandas groupby 取每组的前几行记录方法

#对于以下数据想对每个国家分组，并取age字段前2df = pd.DataFrame({'Country':['China','China', 'India', 'India', 'America', 'Japan', 'China', 'India'], 'Income':[10000, 10000, 5000, 5002, 40000, 50000, 8000, 5000], 'Age':[5000, 4321,...

2020-09-09 17:36:20 2364

原创 pandas.DataFrame.any与pandas.DataFrame.all

一个总的原则就是“any”意味着一行或者一列有一个为真（这里一般指不为0）则返回真，一行或者一列全部为假（一般指0）才为假，”all“意味着一行或者一列所有为真才为真（均不等于0），一行或者一列有一个为假则为假。 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~pandas.DataFrame.any：DataFrame.all(axis=None,

2017-12-23 14:36:11 7585

原创 pandas.cut与pandas.qcut使用方法与区别

pandas.cut:pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)参数：x，类array对象，且必须为一维bins,整数、序列尺度、或间隔索引。如果bins是一个整数，它定义了x宽度范围内的等宽面元，但是在这种情况下，x的范围在每个边上被延长1%，以保证

2017-12-23 11:19:19 28321

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

博客等级

码龄9年

8
原创

7
点赞

40
收藏

4
粉丝

关注

私信

热门文章

分类专栏

scrata-hivesql 3篇
pandas 3篇

最新评论

hive sql笔试题
胖琪小飞侠: 是的，这里确实应该不分区的
hive sql笔试题
胖琪小飞侠: 第一问用sql server可以这样： ---第一问： select a.* from (select * ,lag(name,1)over(order by score_time asc) as name1 ,lag(name,2)over(order by score_time asc) as name2 from basketball_game_score_detail )a where name=name1 and name=name2
hive sql笔试题
Mp努力(●'◡'●): 好像写的不对！！！！！！ ,sum(a_score)over(partition by team order by score_time asc) as a_score_sum ,sum(b_score)over(partition by team order by score_time asc) as b_score_sum 这块不能分区，分区之后对方进球的字段后，我方进球总数就变成0了，这样就有好多的异号！！！！你再看看是不是这样，看完希望给我个回复！！
stratascratch5 Popularity Percentage
隔壁寝室老吴: 这个题的坑点就是好友是相互的

提示

确定要删除当前文章？

取消删除