Setting up a Data Pipeline for Energy Data Extraction βοΈ
Setting up the Extraction Logic
import io
from datetime import datetime
import requests
import pandas as pd
from dateutil.rrule import rrule, MONTHLY
import numpy as np
### Data Eng. Functions ###
def month_iter(start_month, start_year, end_month, end_year):
"""
Returns a generator that yields a tuple like (month, year). Generator
has monthly frequency between start and end dates.
"""
start = datetime(start_year, start_month, 1)
end = datetime(end_year, end_month, 1)
return ((d.month, d.year) for d in rrule(MONTHLY, dtstart=start, until=end))
def download_aemo_data_nem(state, start_month, start_year, end_month, end_year):
"""
Downloads price and demand data from AEMO from 1st day of the starting
month in the starting year to the last day of the ending month in the
ending year. Returns pandas DataFrame.
"""
aemo_df = pd.DataFrame()
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) \
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
for date in month_iter(start_month, start_year, end_month, end_year):
month = str(date[0]).zfill(2)
year = str(date[1])
reference = year + month
print(f'Adding Month/Year: {month}/{year}')
url = f'https://www.aemo.com.au/aemo/data/nem/priceanddemand/PRICE_AND_DEMAND_{reference}_{state}1.csv'
url_data = requests.get(url, headers=headers).content
raw_data = pd.read_csv(io.StringIO(url_data.decode('utf-8')))
aemo_df = aemo_df.append(raw_data, ignore_index=True)
aemo_df['datetime'] = pd.to_datetime(aemo_df['SETTLEMENTDATE'])
aemo_df = aemo_df.set_index('datetime')
aemo_df.drop(['SETTLEMENTDATE', 'PERIODTYPE'],
axis=1, inplace=True)
return aemo_df
This is for for the State: NSW
Adding Month/Year: 06/2021
Adding Month/Year: 07/2021
Adding Month/Year: 08/2021
Adding Month/Year: 09/2021
Adding Month/Year: 10/2021
Adding Month/Year: 11/2021
Adding Month/Year: 12/2021
Adding Month/Year: 01/2022
Adding Month/Year: 02/2022
Adding Month/Year: 03/2022
Adding Month/Year: 04/2022
Adding Month/Year: 05/2022
Adding Month/Year: 06/2022
This is for for the State: QLD
Adding Month/Year: 06/2021
Adding Month/Year: 07/2021
Adding Month/Year: 08/2021
Adding Month/Year: 09/2021
Adding Month/Year: 10/2021
Adding Month/Year: 11/2021
Adding Month/Year: 12/2021
Adding Month/Year: 01/2022
Adding Month/Year: 02/2022
Adding Month/Year: 03/2022
Adding Month/Year: 04/2022
Adding Month/Year: 05/2022
Adding Month/Year: 06/2022
This is for for the State: VIC
Adding Month/Year: 06/2021
Adding Month/Year: 07/2021
Adding Month/Year: 08/2021
Adding Month/Year: 09/2021
Adding Month/Year: 10/2021
Adding Month/Year: 11/2021
Adding Month/Year: 12/2021
Adding Month/Year: 01/2022
Adding Month/Year: 02/2022
Adding Month/Year: 03/2022
Adding Month/Year: 04/2022
Adding Month/Year: 05/2022
Adding Month/Year: 06/2022
This is for for the State: TAS
Adding Month/Year: 06/2021
Adding Month/Year: 07/2021
Adding Month/Year: 08/2021
Adding Month/Year: 09/2021
Adding Month/Year: 10/2021
Adding Month/Year: 11/2021
Adding Month/Year: 12/2021
Adding Month/Year: 01/2022
Adding Month/Year: 02/2022
Adding Month/Year: 03/2022
Adding Month/Year: 04/2022
Adding Month/Year: 05/2022
Adding Month/Year: 06/2022
This is for for the State: SA
Adding Month/Year: 06/2021
Adding Month/Year: 07/2021
Adding Month/Year: 08/2021
Adding Month/Year: 09/2021
Adding Month/Year: 10/2021
Adding Month/Year: 11/2021
Adding Month/Year: 12/2021
Adding Month/Year: 01/2022
Adding Month/Year: 02/2022
Adding Month/Year: 03/2022
Adding Month/Year: 04/2022
Adding Month/Year: 05/2022
Adding Month/Year: 06/2022
Visualisation of the AEMO Extract Data using Tableau
So what have we observed from a quick visual inspection?
So how do we pick up anomalous energy prices for each region? π€·
Technique One: IQR
Firstly, let's visualise the data as-is without any segmentation or outlier (anomalous values) removal.
The Proportion of Outliers Using the IQR Method is: 7.13 percent ( 30109 )
The Average Price $/MWh for Anomalous Values is $622.58 /MWh
The Average Price $/MWh for Normal Values is $92.12/MWh
Secondly, let's add the Regions to our Analysis. π
==========================================================
This is for the Region of NSW1
The Proportion of Outliers Using the IQR Method is: 4.76 percent ( 4019 )
The Average Price $/MWh for Anomalous Values is $686.15 /MWh
The Average Price $/MWh for Normal Values is $120.1/MWh
The Price Ratio between Anomalous/Normal Prices is 5.71x
==========================================================
This is for the Region of QLD1
The Proportion of Outliers Using the IQR Method is: 5.13 percent ( 4334 )
The Average Price $/MWh for Anomalous Values is $1266.83 /MWh
The Average Price $/MWh for Normal Values is $125.54/MWh
The Price Ratio between Anomalous/Normal Prices is 10.09x
==========================================================
This is for the Region of VIC1
The Proportion of Outliers Using the IQR Method is: 5.78 percent ( 4887 )
The Average Price $/MWh for Anomalous Values is $458.04 /MWh
The Average Price $/MWh for Normal Values is $78.26/MWh
The Price Ratio between Anomalous/Normal Prices is 5.85x
==========================================================
This is for the Region of TAS1
The Proportion of Outliers Using the IQR Method is: 6.65 percent ( 5618 )
The Average Price $/MWh for Anomalous Values is $456.98 /MWh
The Average Price $/MWh for Normal Values is $74.73/MWh
The Price Ratio between Anomalous/Normal Prices is 6.12x
==========================================================
This is for the Region of SA1
The Proportion of Outliers Using the IQR Method is: 4.86 percent ( 4108 )
The Average Price $/MWh for Anomalous Values is $758.08 /MWh
The Average Price $/MWh for Normal Values is $85.43/MWh
The Price Ratio between Anomalous/Normal Prices is 8.87x
==========================================================
This is for the Region of NSW1
==========================================================
The Proportion of Outliers Using the IQR Method is: 4.76 percent ( 4019 )
The Average Price $/MWh for Anomalous Values is $686.15 /MWh
The Average Price $/MWh for Normal Values is $120.1/MWh
The Price Ratio between Anomalous/Normal Prices is 5.71x
The Lowest Quarterly Stdev for the Region of NSW1 is Quarter 1 at $113.8 per MWh
The Highest Quarterly Stdev for the Region of NSW1 is Quarter 2 at $501.97 per MWh
The Stdev Price Ratio between Quarterly Prices is 4.41x
==========================================================
This is for the Region of QLD1
==========================================================
The Proportion of Outliers Using the IQR Method is: 5.13 percent ( 4334 )
The Average Price $/MWh for Anomalous Values is $1266.83 /MWh
The Average Price $/MWh for Normal Values is $125.54/MWh
The Price Ratio between Anomalous/Normal Prices is 10.09x
The Lowest Quarterly Stdev for the Region of QLD1 is Quarter 3 at $228.79 per MWh
The Highest Quarterly Stdev for the Region of QLD1 is Quarter 1 at $764.42 per MWh
The Stdev Price Ratio between Quarterly Prices is 3.34x
==========================================================
This is for the Region of VIC1
==========================================================
The Proportion of Outliers Using the IQR Method is: 5.78 percent ( 4887 )
The Average Price $/MWh for Anomalous Values is $458.04 /MWh
The Average Price $/MWh for Normal Values is $78.26/MWh
The Price Ratio between Anomalous/Normal Prices is 5.85x
The Lowest Quarterly Stdev for the Region of VIC1 is Quarter 4 at $46.73 per MWh
The Highest Quarterly Stdev for the Region of VIC1 is Quarter 2 at $286.95 per MWh
The Stdev Price Ratio between Quarterly Prices is 6.14x
==========================================================
This is for the Region of TAS1
==========================================================
The Proportion of Outliers Using the IQR Method is: 6.65 percent ( 5618 )
The Average Price $/MWh for Anomalous Values is $456.98 /MWh
The Average Price $/MWh for Normal Values is $74.73/MWh
The Price Ratio between Anomalous/Normal Prices is 6.12x
The Lowest Quarterly Stdev for the Region of TAS1 is Quarter 4 at $31.49 per MWh
The Highest Quarterly Stdev for the Region of TAS1 is Quarter 2 at $549.27 per MWh
The Stdev Price Ratio between Quarterly Prices is 17.44x
==========================================================
This is for the Region of SA1
==========================================================
The Proportion of Outliers Using the IQR Method is: 4.86 percent ( 4108 )
The Average Price $/MWh for Anomalous Values is $758.08 /MWh
The Average Price $/MWh for Normal Values is $85.43/MWh
The Price Ratio between Anomalous/Normal Prices is 8.87x
The Lowest Quarterly Stdev for the Region of SA1 is Quarter 3 at $87.63 per MWh
The Highest Quarterly Stdev for the Region of SA1 is Quarter 2 at $612.25 per MWh
The Stdev Price Ratio between Quarterly Prices is 6.99x