- 博客(9)
- 收藏
- 关注
原创 stratascratch5 Popularity Percentage
Popularity PercentageFind the popularity percentage for each user on Facebook. The popularity percentage is defined as the total number of friends the user has divided by the total number of users on the platform, then converted into a percentage by mul.
2021-12-14 15:37:13
1200
1
原创 stratascratch 4 Finding User Purchases
Write a query that'll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.--建表use strata;drop tab.
2021-12-13 14:39:19
1429
原创 stratascratch2-Acceptance Rate By Date
What is the overall friend acceptance rate by date? Your output should have the rate of acceptances by the date the request was sent. Order by the earliest date to latest. Assume that each friend request starts by a user sending (i.e., user_id_sender) a f.
2021-12-13 13:50:32
317
原创 hive面试题--连续性问题
背景面试和工作中经常遇到,用sql/hive求出连续N天登录或者连续N天销售额破万的记录等。数据准备--新建表与导入数据use test;create table if not exists log_info(uid string,log_date string)row format delimited fields terminated by ',';insert into table log_info values('A','20210901'),
2021-11-02 16:22:12
1546
转载 hive sql笔试题
来源于某多社招。原文:https://blog.youkuaiyun.com/qq_24206673/article/details/108282465题目:有一张表记录了一场篮球赛的得分情况,主要有以下字段:队名(team)、队员名(name)、队员号(num)、得分(score)、得分时间 秒级(score_time)。要求用sql/hive 连续三次为本队得分的球员; 为本队比分反超的球员,以及对应的完成反超的时刻 思路整理:拆分问题(1),连续三次得分转换为:将得分表按照升序
2021-11-01 16:55:45
943
3
原创 pandas groupby 取每组的前几行记录方法
#对于以下数据想对每个国家分组,并取age字段前2df = pd.DataFrame({'Country':['China','China', 'India', 'India', 'America', 'Japan', 'China', 'India'], 'Income':[10000, 10000, 5000, 5002, 40000, 50000, 8000, 5000], 'Age':[5000, 4321,...
2020-09-09 17:36:20
2314
原创 pandas.DataFrame.any与pandas.DataFrame.all
一个总的原则就是“any”意味着一行或者一列有一个为真(这里一般指不为0)则返回真,一行或者一列全部为假(一般指0)才为假,”all“意味着一行或者一列所有为真才为真(均不等于0),一行或者一列有一个为假则为假。 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~pandas.DataFrame.any:DataFrame.all(axis=None,
2017-12-23 14:36:11
7508
原创 pandas.cut与pandas.qcut使用方法与区别
pandas.cut:pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)参数:x,类array对象,且必须为一维bins,整数、序列尺度、或间隔索引。如果bins是一个整数,它定义了x宽度范围内的等宽面元,但是在这种情况下,x的范围在每个边上被延长1%,以保证
2017-12-23 11:19:19
28254
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人