笔试中较难SQL考题总结

笔试常见SQL难题

1. 用户连续登录

思路

  1. 以用户为组,日期为序,每一行记录加上一个行号

    注意,一个用户一天可以登录多次,如果全部排号,这个方法就失效了,所以加一个group by,但是不聚合,这相当于选择了distinct(user_id sales_date)的组合

    1
    2
    3
    4
    5
    6
    select 
    user_id,
    sales_date,
    row_number() over(partition by user_id order by sales_date) rn
    from sales_tb
    group by user_id,date()
  2. 将上表作为cte,用user_id, date_sub(current_date, interval rn day) group by, 这样相当于把连续的天数聚合到了同一个天(1/3 -1, 1/4 -2, 1/6 - 3 中断)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    with cte as (
    select
    user_id,
    sales_date,
    row_number() over(partition by user_id order by sales_date) rn
    from sales_tb
    group by user_id,sales_date
    )
    select
    count(*)
    from cte
    group by user_id, date_sub(sales_date,interval rn day)

变体:连续签到领取金币

思路:

  1. 与上题一样,将uid和登录日期combine-distinct后,加上一列行号
1
2
3
4
5
6
7
8
9
10
SELECT
uid,
date(in_time) as login_date,
row_number() over(partition by uid order by date(in_time)) as rn
from
tb_user_log
where date(in_time) >= '2021-07-07' and date(in_time) < '2021-11-01'
and sign_in = 1
and artical_id = 0
GROUP BY uid, date(in_time)
  1. 将上表作为cte,同样通过减去行号的方式,将连续的签到天数聚合到一天(past day)
1
2
3
4
5
6
7
8
select 
uid,
date_sub(login_date, interval rn DAY) ad,
count(*) as days,
FLOOR(count(*)/7) as fac,
count(*)%7 as res
from cte
GROUP by uid, date_sub(login_date, interval rn DAY)
  1. 将上表作为cte2, 增加两列
    1. 将past day + lag(day counts)+ 1, 这样相当于找到了同uid下上一次连续登录了几天,由于当前行的past day 是由其所代表的连续签到天数的一天减掉其行号所得到的,而这个行号比上一行的最大连续签到天数再大1,所以是past + lag + 1
    2. 根据天数计算及金币数量
1
2
3
4
5
6
7
8
select 
uid,
date_add(ad, interval (ifnull(lag(days) over(partition by uid order by ad),0)+1) DAY) cd,
case
when res < 3 then fac*15 + res
when res >= 3 then fac*15 + res+2
end coin
from cte2
  1. 以上表为cte3,根据uid,月份聚合,得到结果,注意时间格式抽取函数的运用
1
2
3
4
5
6
select 
uid,
date_format(cd,"%Y%m"),
sum(coin)
from cte3
group by uid, date_format(cd,"%Y%m")

技巧:MySQL 时间格式化函数

1
2
3
4
select 
date_format(target_datetime, '%Y-%m-...')
from
tb_time

格式具体代码如下

2. 同一时间最大观看人数

思路:

  1. 将进入和登出union起来,并把进入记为1,退出记为-1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
select 
artical_id,
uid,
in_time as tn,
1 person
from tb_user_log
where artical_id != 0
UNION
select
artical_id,
uid,
out_time as tn,
-1 person
from tb_user_log
where artical_id != 0
  1. 以上表为cte,以文章为组,根据文章,时间点,人数排序,用sum记录当前实时人数 running total
1
2
3
4
5
select
*,
sum(person) over(partition by artical_id order by artical_id, tn asc, person DESC rows BETWEEN UNBOUNDED PRECEDING and CURRENT ROW ) as p_count
from
cte1
  1. 以上表为cte,选取各文章组别组别中实时人数最大的时间节点
1
2
3
4
select artical_id, max(p_count) as max_read
from cte2
group by artical_id
order by max_read desc

3. 新用户的次日留存率

  1. 由于跨天登入登出算都活跃,所以将in和out union起来,然后用group by uid,date(time) 来去除一天内重复的活跃记录(相当于只记录每天有没有活跃)
1
2
3
4
5
6
7
8
select 
uid,date(in_time) as tn
from tb_user_log
union
select
uid, date(out_time) as tn
from tb_user_log
group by uid,date(tn)
  1. 以上图为cte
    1. 增加一列判断该用户本次登录时是否为新用户(以用户为组,时间为序,给每一次登录一个row number, 新用户row_numbe为1)
    2. left join自身 on (1.day+1 = 2.day),增加一列判断该用户在后一天是否活跃(left join自身,如果后一天未活跃,则会得到null)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
select a1.*, 
case
when row_number() over(partition by a1.uid order by a1.tn) = 1 then 'NEW'
else 'OLD'
end user_status,
case
when a2.uid is not null then 1
else 0
end next_day
from active a1
left join active a2 ON
date_add(a1.tn,interval 1 day) = a2.tn
and
a1.uid = a2.uid
  1. 以上表为cte,在where语句中加入只计算新用户,月份为11等条件,group by聚合,计算每天的的今日新用户次日留存数/今日新用户总数
1
2
3
4
5
6
7
select 
tn,
round(sum(next_day)/count(*),2) as uv_left_rate
from cte where user_status = 'NEW'
and month(tn) = 11
group by tn
order by tn

4.行列置换

要对每一列的值进行聚合,在把其作为一行输出

1
2
3
4
5
select 'column_1',sum(column_1), count(column_1) 
from orders
union
select 'column_2',sum(column_2), count(column_2)
from orders

技巧:从csv导入文件

标准的导入语法为

1
2
3
4
5
6
LOAD DATA INFILE 'path'
INTO TABLE tb
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES

注意:

  1. 文件必须放在MYSQL指定的安全文件夹中,才能导入,可以使用语句

    1
    SHOW VARIABLES LIKE 'secure_file_priv'

    来查看

  2. 文件路径中的反斜杠/

  3. 若在windows文件夹下,行截止符应为 ''

    1
    LINES TERMINATED BY '\r\n'
  4. IGNORE 用于跳过表头行

5. 选取组内最小的另一种思路

思路:

这里不使用窗口函数和子语句,因为那样不方便查找min(date)所对应的device

  1. 以所有该用户的event_date为子查询,选取组内小于等于任何event_date的 event date,得到的就是组内最小值,且可以简单的得到最小值对应行的其他数据
1
2
SELECT player_id, device_id from Activity a1
WHERE event_date <= ALL(SELECT event_date FROM Activity a2 WHERE a1.player_id = a2.player_id );

6. 选取第二高的薪水

思路:

可以使用dense_rank进行排序,但这里使用另一种方式

  1. 选出distinct的salary,降序排列,然后使用limit offset进行筛选
1
2
3
4
5
6
SELECT DISTINCT
Salary AS SecondHighestSalary
FROM
Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1
  1. 加上ifnull
1
2
3
4
5
6
7
SELECT
IFNULL(
(SELECT DISTINCT Salary
FROM Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1),
NULL) AS SecondHighestSalary

技巧:有关limit和offset的使用

  1. 当limit后跟一个参数时,表示从头开始选取若干行

    select * from tb limit 3

  2. 当limit后跟两个参数时,表示从头开始跳过a行,再选取之后的b行

    select * from tb limit 2,3 --跳过1-2行,选取3-5行

  3. limit offset 组合使用时,limit后只跟一个参数表示选取多少行,offset后一个参数,表示跳过多少行

    select * from tb limit 3 offset 2 --效果同 2.

7.技巧:从json导入数据

似乎有专门的导入工具,暂时可以使用此网址,输入json格式的数据(不用输入表头),将其转化为csv文件,在参考从csv导入文件

8. 选择连续且为空的座位

思路:

此题应该与用户连续登录区分开来,此问题需要输出所有连续的行号,而不需要聚合,所以用row number判断并不恰当

  1. 将表格和自身join起来,on
    1. abs(1.seat - 2.seat)= 1
    2. 1.free = 2.free = 1
  2. 这样出现在表格中的每一行都一定有一个neighbor, 且neighbour也为空座,最后再选择distinct
1
2
3
4
5
select distinct a.seat_id
from cinema a join cinema b
on abs(a.seat_id - b.seat_id) = 1
and a.free = true and b.free = true
order by a.seat_id

9.技巧:使用concat()

concat()的作用是把两个列连接成一个列,当要使用类似如下语句时

1
select * from table where concat(a,b) not in (select concat(c,d) from cte)

此语句逻辑为:只要a,b不同时等于某些特定组合,就可以选择

10. 累积求和/差

如图,若干位乘客要登上大巴,每位乘客有一登车次序,乘客的体重列在表中,大巴车能容纳1000KG的重量,要求返回最后一名能够上车的乘客的姓名

思路:按照登车顺序,计算登车乘客体重的running total:

1
2
3
4
with cte as (
select *,
sum(weight) over(order by turn rows BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as roll_sum
from Quene order by turn

筛选体重running total小于等于1000的乘客,并获取最后一位乘客的姓名

1
2
3
4
5
6
7
8
with allow as (
select *
from cte
where roll_sum <=1000
)
select person_name
from allow
where turn = (select max(turn) from allow)

笔试中较难SQL考题总结
http://example.com/2022/09/30/SQL-problems/
Author
Zhengyuan Yang
Posted on
September 30, 2022
Licensed under