Calculate daily averages across multiple years

By February 23, 2018Data science, Machine Learning

When working with time series data, especially data which is seasonal in nature, it is often useful to calculate the average of the same date and/or time across multiple years.

For example, in electricity markets such as the Western Australian energy market, balance price is reported on 3o minute intervals.  It may be useful to work out the average interval price over many years.

The following figure illustrates the balance price from 2012 to 2017.

Using pandas the interval average across multiple years can be calculated.

Assuming the data is in a dataframe

df

, and the index contains the interval timestamp e.g. 2012-09-01 04:30:00, the average across all the data for each interval can be calculated by:


dfEst = df.groupby([df.index.month,df.index.day,df.index.hour,df.index.minute]).mean()

This produces a result like:

These averages can then be mapped back across the previous dates using:


df['est'] = df['Trading Interval'].apply(lambda x: dfEst.loc[(x.month, x.day, x.hour, x.minute), 'Final Price ($/MWh)'])

Replotting this over the original data obviously shows that the average doesn’t fit the actual, as the actual data is more variable year on year in this case.  But, it may be useful to fill in gaps in the data, or to provide a very rough estimate of future prices.