# 第一阶段模块二

**Repository Path**: old_he/bigdata_1_2

## Basic Information

- **Project Name**: 第一阶段模块二
- **Description**: No description available
- **Primary Language**: Java
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-10-18
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 第一阶段模块二

## 第一题
思路跟讲义中SQL面试题7天连续登录用户一样  
select team,count(*) cnt from (select team,year, (year- row_number() over(partition by team order by year)) gid from t1) tmp group by team,gid having cnt >= 3;  
## 第二题
题目中数据时间格式有点问题，把9:40改成了09:40。  
首先按id分组，求出分组中每一行跟上一行的差值last及下一行的差值next。  
使用条件判断语句：last和next都为正是波峰，last和next都为负是波谷。  
with tmp_2 as (select id,time,price,case when (last >=0 and next >=0) then '波峰' when (last <=0 and next <=0) then '波谷' else null end feature from (select id,time,price,nvl((price - lag(price) over(partition by id order by time asc)),0) last, nvl((price - lead(price) over(partition by id order by time asc)),0) next from t2) tmp) select * from tmp_2 where feature is not null;  
## 第三题
### 1
简单的groupby语句，再使用计数和最值函数辅助日期计算即可。  
select id,count(*) cnt, (max(unix_timestamp(dt,'yyyy/MM/dd HH:mm')) - min(unix_timestamp(dt,'yyyy/MM/dd HH:mm')))/60 time from t3 group by id;
### 2
首先按id分组，每一组用lag函数求出当前行与上一行的时间差，新建一列fz，时间差大于30分钟的，fz为1，否则为0；  
对上面的结果进行按id分组后对fz列求和(sum)，因为添加orderby后求和函数的值是当到当前列的累加值，所以sum的值可以用来分段；  
利用sum进行groupby操作，求出最终结果。  
select id,(max(unix_timestamp(dt,'yyyy/MM/dd HH:mm')) - min(unix_timestamp(dt,'yyyy/MM/dd HH:mm')))/60,count(*) from (with tmp as (select id, dt, if(nvl(unix_timestamp(dt,'yyyy/MM/dd HH:mm') - lag(unix_timestamp(dt,'yyyy/MM/dd HH:mm')) over(partition by id order by unix_timestamp(dt,'yyyy/MM/dd HH:mm') asc),0)/60 > 30, 1, 0) fz from t3) 
select id,dt,sum(fz) over(partition by id order by unix_timestamp(dt,'yyyy/MM/dd HH:mm') asc) m from tmp order by id,dt asc) tmp_2 group by id,m;