# Write your MySQL query statement below SELECT gender, day, @total :=@total* (@pre= (@pre := gender)) + score_points AS total FROM scores, ( SELECT @total :=0, @pre :='F' ) init ORDERBY gender, day;
in this query, we have 2 variables @total and pre. we compare pre and current gender, if it is same, then we accumulate, otherwise, score_points would be my new total.
sol 2, windows function
1 2 3 4 5 6
SELECT gender, day, SUM(score_points) OVER (PARTITIONBY gender ORDERBYday) AS total FROM scores
this is the best solution. so concise.
we also can rewrite like this
1 2 3
select gender, day, sum(score_points) over w total from scores window w as (partitionby gender orderbyday)
sol 3, using join
intitially, I write the query as such
1 2 3 4 5
SELECT s1.gender, s1.day, SUM(s1.score_points) AS total FROM Scores AS s1 JOIN Scores AS s2 ON s1.gender = s2.gender AND s1.day >= s2.day GROUPBY s1.gender, s1.day ORDERBY s1.gender, s1.day;
but it gives me result
1 2 3 4 5 6 7 8 9 10 11
| gender |day| total | |------ | ---------- | ----- | | F |2019-12-30|17| | F |2019-12-31|46| | F |2020-01-01|51| | F |2020-01-07|92| | M |2019-12-18|2| | M |2019-12-25|22| | M |2019-12-30|39| | M |2019-12-31|12| | M |2020-01-07|35|
this is wrong. this error likes at on syntax: ON s1.gender = s2.gender AND s1.day >= s2.day
how about change it as such:
1 2 3 4 5
SELECT s1.gender, s1.day, SUM(s1.score_points) AS total FROM Scores AS s1 JOIN Scores AS s2 ON s1.gender = s2.gender AND s1.day <= s2.day GROUPBY s1.gender, s1.day ORDERBY s1.gender, s1.day;
still wrong
1 2 3 4 5 6 7 8 9 10 11
| gender |day| total | |------ | ---------- | ----- | | F |2019-12-30|68| | F |2019-12-31|69| | F |2020-01-01|34| | F |2020-01-07|23| | M |2019-12-18|10| | M |2019-12-25|44| | M |2019-12-30|39| | M |2019-12-31|6| | M |2020-01-07|7|
in order to make it right, we need to understand how join works.
let’s check a left join example.
1 2 3 4 5
SELECT CustomerName, OrderDate FROM Customers LEFTJOIN Orders ON Customers.CustomerID = Orders.CustomerID;
if I re-write it as python code, it looks like this:
1 2 3 4 5 6
for customer in customers: order = orders.filter(orders.CustomerID == customer.CustomerID).first() if order isnotNone: print(customer.CustomerName, order.OrderDate) else: print(customer.CustomerName, None)
so table on the left side of join would be outer loop, table on the right side would be inner join.
# Iterate through the lists for customer in customers: order = next((order for order in orders if order['CustomerID'] == customer['CustomerID']), None) if order isnotNone: print(customer['CustomerName'], order['OrderDate']) else: print(customer['CustomerName'], None)
let’s check exactly what the table looks like after we do join
| gender | day | s1p | s2p | | ------ | ---------- | --- | --- | | F | 2019-12-30 | 17 | 17 | | F | 2019-12-31 | 23 | 17 | | F | 2019-12-31 | 23 | 23 | | F | 2020-01-01 | 17 | 17 | | F | 2020-01-01 | 17 | 17 | | F | 2020-01-01 | 17 | 23 | | F | 2020-01-07 | 23 | 17 | | F | 2020-01-07 | 23 | 17 | | F | 2020-01-07 | 23 | 23 | | F | 2020-01-07 | 23 | 23 | | M | 2019-12-18 | 2 | 2 | | M | 2019-12-25 | 11 | 2 | | M | 2019-12-25 | 11 | 11 | | M | 2019-12-30 | 13 | 2 | | M | 2019-12-30 | 13 | 11 | | M | 2019-12-30 | 13 | 13 | | M | 2019-12-31 | 3 | 2 | | M | 2019-12-31 | 3 | 3 | | M | 2019-12-31 | 3 | 11 | | M | 2019-12-31 | 3 | 13 | | M | 2020-01-07 | 7 | 3 | | M | 2020-01-07 | 7 | 2 | | M | 2020-01-07 | 7 | 13 | | M | 2020-01-07 | 7 | 7 | | M | 2020-01-07 | 7 | 11 |
raw data is as such:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Scores table: +-------------+--------+------------+--------------+ | player_name | gender | day | score_points | +-------------+--------+------------+--------------+ | Aron | F | 2020-01-01 | 17 | | Alice | F | 2020-01-07 | 23 | | Bajrang | M | 2020-01-07 | 7 | | Khali | M | 2019-12-25 | 11 | | Slaman | M | 2019-12-30 | 13 | | Joe | M | 2019-12-31 | 3 | | Jose | M | 2019-12-18 | 2 | | Priya | F | 2019-12-31 | 23 | | Priyanka | F | 2019-12-30 | 17 | +-------------+--------+------------+--------------+
s1 table is outer loop, s2 table is inner loop
first row is ‘Aron’, then loop s2, s2’s first row is ‘Aron’, so s1p = 17, s2p=17 continue loop s2, but we have restriction s1.day >= s2.day, hence for 2019-12-30, we only get
1
| F | 2019-12-30 | 17 | 17 |
go back to outer loop, second row is Priya, for inner loop s2, it has ‘2019-12-30’ and ‘2019-12-31’, so s2p would be 17 and 23, but s1p would be 23 and 23, contiue this process, then we would get the result table.
so for sum(s1p), for 12-30, it is 17, for 12-31 , it is 23 +23 = 46, for 1-1, it is 17+17+17=51, etc. totally wrong but if we observe sum(s2p), then it would be correct. for 12-30: 17, 12-31: 17+23, 1-1: 17+23+17, etc.
hence the correct answer would be:
1 2 3 4 5
SELECT s1.gender, s1.day, SUM(s2.score_points) AS total FROM Scores AS s1 JOIN Scores AS s2 ON s1.gender = s2.gender AND s1.day >= s2.day GROUPBY s1.gender, s1.day ORDERBY s1.gender, s1.day;
2 caveats:
in sum, we need to use inner loop table column
for join, we need to gurantee s1.day >= s2.day. otherwise, for s1.day=12-30, then s2 could choose 12-30, 12-31,1-1 etc, this is wrong.