# 第一阶段模块二 **Repository Path**: old_he/bigdata_1_2 ## Basic Information - **Project Name**: 第一阶段模块二 - **Description**: No description available - **Primary Language**: Java - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-10-18 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 第一阶段模块二 ## 第一题 思路跟讲义中SQL面试题7天连续登录用户一样 select team,count(*) cnt from (select team,year, (year- row_number() over(partition by team order by year)) gid from t1) tmp group by team,gid having cnt >= 3; ## 第二题 题目中数据时间格式有点问题,把9:40改成了09:40。 首先按id分组,求出分组中每一行跟上一行的差值last及下一行的差值next。 使用条件判断语句:last和next都为正是波峰,last和next都为负是波谷。 with tmp_2 as (select id,time,price,case when (last >=0 and next >=0) then '波峰' when (last <=0 and next <=0) then '波谷' else null end feature from (select id,time,price,nvl((price - lag(price) over(partition by id order by time asc)),0) last, nvl((price - lead(price) over(partition by id order by time asc)),0) next from t2) tmp) select * from tmp_2 where feature is not null; ## 第三题 ### 1 简单的groupby语句,再使用计数和最值函数辅助日期计算即可。 select id,count(*) cnt, (max(unix_timestamp(dt,'yyyy/MM/dd HH:mm')) - min(unix_timestamp(dt,'yyyy/MM/dd HH:mm')))/60 time from t3 group by id; ### 2 首先按id分组,每一组用lag函数求出当前行与上一行的时间差,新建一列fz,时间差大于30分钟的,fz为1,否则为0; 对上面的结果进行按id分组后对fz列求和(sum),因为添加orderby后求和函数的值是当到当前列的累加值,所以sum的值可以用来分段; 利用sum进行groupby操作,求出最终结果。 select id,(max(unix_timestamp(dt,'yyyy/MM/dd HH:mm')) - min(unix_timestamp(dt,'yyyy/MM/dd HH:mm')))/60,count(*) from (with tmp as (select id, dt, if(nvl(unix_timestamp(dt,'yyyy/MM/dd HH:mm') - lag(unix_timestamp(dt,'yyyy/MM/dd HH:mm')) over(partition by id order by unix_timestamp(dt,'yyyy/MM/dd HH:mm') asc),0)/60 > 30, 1, 0) fz from t3) select id,dt,sum(fz) over(partition by id order by unix_timestamp(dt,'yyyy/MM/dd HH:mm') asc) m from tmp order by id,dt asc) tmp_2 group by id,m;