Advanced SQL Challenge
The Challenge
Here is a challenge for SQL enthusiasts.
I'll solve it here using PostgreSQL 10.10 on Kubuntu 18.04; but feel free to give it a try in your favorite RDBMS.
We have a simple table with two columns : Column dt is a date, and column rt is an exchange rate, showing real values of USD/TRY from July/August 2018. That's a period where this rate hits a peak.
I'll solve it here using PostgreSQL 10.10 on Kubuntu 18.04; but feel free to give it a try in your favorite RDBMS.
We have a simple table with two columns : Column dt is a date, and column rt is an exchange rate, showing real values of USD/TRY from July/August 2018. That's a period where this rate hits a peak.
The table contains data between 16 July and 17 August 2018. But it's far from being complete. Actually it contains only the Tuesdays and Thursdays, plus the first day (16 July) and the last day (17 August).
Our goal is to calculate substitute values for each of these missing days. The result dataset shall contain all dates between 16 July and 17 August 2018.
Our goal is to calculate substitute values for each of these missing days. The result dataset shall contain all dates between 16 July and 17 August 2018.
usd_tr_rates table contents
dt | rt |
07/16/18 | 4.8516 |
07/17/18 | 4.8525 |
07/19/18 | 4.8389 |
07/24/18 | 4.7933 |
07/26/18 | 4.8342 |
07/31/18 | 4.9161 |
08/02/18 | 5.0671 |
08/07/18 | 5.2808 |
08/09/18 | 5.4167 |
08/14/18 | 6.5681 |
08/16/18 | 5.8174 |
08/17/18 | 6.0142 |
Load Source Table Data
Hereunder you can find DDL/DML statements to prepare source data in PostgreSQL. At least the insert statements should work for other databases as well.
CREATE DATABASE olric;
\c olric;
CREATE TABLE usd_tr_rates (dt date, rt decimal(18,8));
insert into usd_tr_rates (dt, rt) values ('16/7/2018', 4.8516);
insert into usd_tr_rates (dt, rt) values ('17/7/2018', 4.8525);
insert into usd_tr_rates (dt, rt) values ('19/7/2018', 4.8389);
insert into usd_tr_rates (dt, rt) values ('24/7/2018', 4.7933);
insert into usd_tr_rates (dt, rt) values ('26/7/2018', 4.8342);
insert into usd_tr_rates (dt, rt) values ('31/7/2018', 4.9161);
insert into usd_tr_rates (dt, rt) values ('2/8/2018', 5.0671);
insert into usd_tr_rates (dt, rt) values ('7/8/2018', 5.2808);
insert into usd_tr_rates (dt, rt) values ('9/8/2018', 5.4167);
insert into usd_tr_rates (dt, rt) values ('14/8/2018', 6.5681);
insert into usd_tr_rates (dt, rt) values ('16/8/2018', 5.8174);
insert into usd_tr_rates (dt, rt) values ('17/8/2018', 6.0142);
\c olric;
CREATE TABLE usd_tr_rates (dt date, rt decimal(18,8));
insert into usd_tr_rates (dt, rt) values ('16/7/2018', 4.8516);
insert into usd_tr_rates (dt, rt) values ('17/7/2018', 4.8525);
insert into usd_tr_rates (dt, rt) values ('19/7/2018', 4.8389);
insert into usd_tr_rates (dt, rt) values ('24/7/2018', 4.7933);
insert into usd_tr_rates (dt, rt) values ('26/7/2018', 4.8342);
insert into usd_tr_rates (dt, rt) values ('31/7/2018', 4.9161);
insert into usd_tr_rates (dt, rt) values ('2/8/2018', 5.0671);
insert into usd_tr_rates (dt, rt) values ('7/8/2018', 5.2808);
insert into usd_tr_rates (dt, rt) values ('9/8/2018', 5.4167);
insert into usd_tr_rates (dt, rt) values ('14/8/2018', 6.5681);
insert into usd_tr_rates (dt, rt) values ('16/8/2018', 5.8174);
insert into usd_tr_rates (dt, rt) values ('17/8/2018', 6.0142);
Required Output
Your SQL shall generate the following output.
dt_new | rt_new |
2018-07-16 | 4.85160000000000000000 |
2018-07-17 | 4.85250000000000000000 |
2018-07-18 | 4.84570000000000000000 |
2018-07-19 | 4.83890000000000000000 |
2018-07-20 | 4.82978000000000000000 |
2018-07-21 | 4.82066000000000000000 |
2018-07-22 | 4.81154000000000000000 |
2018-07-23 | 4.80242000000000000000 |
2018-07-24 | 4.79330000000000000000 |
2018-07-25 | 4.81375000000000000000 |
2018-07-26 | 4.83420000000000000000 |
2018-07-27 | 4.85058000000000000000 |
2018-07-28 | 4.86696000000000000000 |
2018-07-29 | 4.88334000000000000000 |
2018-07-30 | 4.89972000000000000000 |
2018-07-31 | 4.91610000000000000000 |
2018-08-01 | 4.99160000000000000000 |
2018-08-02 | 5.06710000000000000000 |
2018-08-03 | 5.10984000000000000000 |
2018-08-04 | 5.15258000000000000000 |
2018-08-05 | 5.19532000000000000000 |
2018-08-06 | 5.23806000000000000000 |
2018-08-07 | 5.28080000000000000000 |
2018-08-08 | 5.34875000000000000000 |
2018-08-09 | 5.41670000000000000000 |
2018-08-10 | 5.64698000000000000000 |
2018-08-11 | 5.87726000000000000000 |
2018-08-12 | 6.10754000000000000000 |
2018-08-13 | 6.33782000000000000000 |
2018-08-14 | 6.56810000000000000000 |
2018-08-15 | 6.19275000000000000000 |
2018-08-16 | 5.81740000000000000000 |
Solution (PostgreSQL)
My solution in PostgreSQL is as follows.
with rates1 as
(
select dt, rt,
max(dt) over (partition by 1 order by dt
rows between 1 following and 1 following) next_dt,
max(rt) over (partition by 1 order by dt
rows between 1 following and 1 following) next_rt
from usd_tr_rates
),
rates2 as
(
select dt, rt, next_dt, next_rt,
next_dt - dt multip
from rates1
),
series as
(
select * from generate_series(1,10)
)
select
rates2.dt+series.generate_series-1 dt_new,
rates2.rt+((series.generate_series-1) * ((next_rt - rt)/multip)) rt_new
from
rates2
join
series
on
series.generate_series<=rates2.multip
order by
dt, generate_series;
(
select dt, rt,
max(dt) over (partition by 1 order by dt
rows between 1 following and 1 following) next_dt,
max(rt) over (partition by 1 order by dt
rows between 1 following and 1 following) next_rt
from usd_tr_rates
),
rates2 as
(
select dt, rt, next_dt, next_rt,
next_dt - dt multip
from rates1
),
series as
(
select * from generate_series(1,10)
)
select
rates2.dt+series.generate_series-1 dt_new,
rates2.rt+((series.generate_series-1) * ((next_rt - rt)/multip)) rt_new
from
rates2
join
series
on
series.generate_series<=rates2.multip
order by
dt, generate_series;
Comments
Post a Comment