Home > Articles > Linear interpolation in Postgres using generate_series

Linear interpolation in Postgres using generate_series

I like to keep track of how many miles I'm driving in my car. One conceivable way of doing this is to create a table in a Postgres database in which I can track this information.

CREATE TABLE mileage (
    observed_date DATE,
    observed_mileage INTEGER
);

Unfortunately, I'm not always the most regular data collector. I often collect this data with gaps of days or months between each reading.

INSERT INTO mileage(observed_date, observed_mileage) VALUES
    ('2018-05-21', 84088),
    ('2018-05-26', 84201),
    ('2018-06-13', 84910);

I want to get some sense for how much I'm driving each day, and one reasonable way I might do that is to linearly interpolate the mileage between readings. For instance, if I see a reading of 10,000 on August 1st and a reading of 11,000 on August 10th, I want to see that on average I drove 100 miles each day 1-10 August.

How can we do this in Postgres? First, we pair the data up:

SELECT LAG(observed_date) OVER (ORDER BY observed_date) AS lag_date
  , LAG(observed_mileage) OVER (ORDER BY observed_mileage) AS lag_mi
  , observed_date AS obs_date
  , observed_mileage AS obs_mi
FROM mileage;

which yields result:

  lag_date  | lag_mi |  obs_date  | obs_mi
------------+--------+------------+--------
   |        | 2018-05-21 |  84088
 2018-05-21 |  84088 | 2018-05-26 |  84201
 2018-05-26 |  84201 | 2018-06-13 |  84910

Then, we generate a series between each pair of dates:

WITH paired_dates AS (
  SELECT LAG(observed_date) OVER (ORDER BY observed_date) AS lag_date
    , LAG(observed_mileage) OVER (ORDER BY observed_mileage) AS lag_mi
    , observed_date AS obs_date
    , observed_mileage AS obs_mi
  FROM mileage
)
SELECT *
FROM paired_dates
  , generate_series(lag_date, obs_date, INTERVAL '1 day') days(driven_date)
LIMIT 10;

which yields result:

  lag_date  | lag_mi |  obs_date  | obs_mi |      driven_date
------------+--------+------------+--------+------------------------
 2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-21 00:00:00+00
 2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-22 00:00:00+00
 2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-23 00:00:00+00
 2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-24 00:00:00+00
 2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-25 00:00:00+00
 2018-05-21 |  84088 | 2018-05-26 |  84201 | 2018-05-26 00:00:00+00
 2018-05-26 |  84201 | 2018-06-13 |  84910 | 2018-05-26 00:00:00+00
 2018-05-26 |  84201 | 2018-06-13 |  84910 | 2018-05-27 00:00:00+00
 2018-05-26 |  84201 | 2018-06-13 |  84910 | 2018-05-28 00:00:00+00
 2018-05-26 |  84201 | 2018-06-13 |  84910 | 2018-05-29 00:00:00+00

Note that 2018-05-26 occurs twice in the driven_date column. We can fix that by stopping our series just before getting to the later date:

WITH paired_dates AS (
  SELECT LAG(observed_date) OVER (ORDER BY observed_date) AS lag_date
    , LAG(observed_mileage) OVER (ORDER BY observed_mileage) AS lag_mi
    , observed_date AS obs_date
    , observed_mileage AS obs_mi
  FROM mileage
)
SELECT *
FROM paired_dates
  , generate_series(lag_date, obs_date - INTERVAL '1 minute', INTERVAL '1 day') days(driven_date)
LIMIT 10;

Anyway, now we need to calculate the actual number of miles driven on the driven_date.

WITH paired_dates AS (
  SELECT LAG(observed_date) OVER (ORDER BY observed_date) AS lag_date
    , LAG(observed_mileage) OVER (ORDER BY observed_mileage) AS lag_mi
    , observed_date AS obs_date
    , observed_mileage AS obs_mi
  FROM mileage
)
SELECT driven_date
  , (obs_mi - lag_mi)::NUMERIC / (obs_date - lag_date) AS miles_driven
FROM paired_dates
  , generate_series(lag_date, obs_date - INTERVAL '1 minute', INTERVAL '1 day') days(driven_date)
LIMIT 10;

which yields result:

     driven_date     |    miles_driven
---------------------+---------------------
 2018-05-21 00:00:00 | 22.6000000000000000
 2018-05-22 00:00:00 | 22.6000000000000000
 2018-05-23 00:00:00 | 22.6000000000000000
 2018-05-24 00:00:00 | 22.6000000000000000
 2018-05-25 00:00:00 | 22.6000000000000000
 2018-05-26 00:00:00 | 39.3888888888888889
 2018-05-27 00:00:00 | 39.3888888888888889
 2018-05-28 00:00:00 | 39.3888888888888889
 2018-05-29 00:00:00 | 39.3888888888888889
 2018-05-30 00:00:00 | 39.3888888888888889

And thus I have achieved the desired result.