Pre-Requisites to Understanding this Article
The Standard Operating Workflow as a Data Analytics Professional ⚙️
What is an Integration? 🤷
import time
import pandas as pd
start_time = time.time()
print("Reading Data Via Pandas")
citibike_df = pd.read_csv("/work/201306-citibike-tripdata.csv")
end_time = time.time()
time_diff = end_time - start_time
print("Time to Load Data is: " + str(time_diff))
print(len(citibike_df))
Reading Data Via Pandas
Time to Load Data is: 1.514803409576416
577703
#!pip install dask
import dask.dataframe as dd
start_time = time.time()
print("Reading Data Via Dask")
dask_df = dd.read_csv('/work/201306-citibike-tripdata.csv')
end_time = time.time()
time_diff = end_time - start_time
print("Time to Load Data is: " + str(time_diff) + " seconds")
print(len(dask_df))
Reading Data Via Dask
Time to Load Data is: 0.01353144645690918 seconds
577703
#!pip install dask
import time as time
import dask.dataframe as dd
start_time = time.time()
print("Reading Data Via Dask")
dask_df = dd.read_csv('/work/201306-citibike-tripdata.csv')
end_time = time.time()
time_diff = end_time - start_time
print("Time to Load Data is: " + str(time_diff) + " seconds")
print(len(dask_df))
dask_df.memory_usage()
Collecting dask
Downloading dask-2021.8.1-py3-none-any.whl (1.0 MB)
|████████████████████████████████| 1.0 MB 19.4 MB/s
Requirement already satisfied: toolz>=0.8.2 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from dask) (0.11.1)
Requirement already satisfied: packaging>=20.0 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from dask) (21.0)
Collecting partd>=0.3.10
Downloading partd-1.2.0-py3-none-any.whl (19 kB)
Collecting cloudpickle>=1.1.1
Downloading cloudpickle-1.6.0-py3-none-any.whl (23 kB)
Collecting fsspec>=0.6.0
Downloading fsspec-2021.7.0-py3-none-any.whl (118 kB)
|████████████████████████████████| 118 kB 49.2 MB/s
Requirement already satisfied: pyyaml in /shared-libs/python3.7/py/lib/python3.7/site-packages (from dask) (5.4.1)
Requirement already satisfied: pyparsing>=2.0.2 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from packaging>=20.0->dask) (2.4.7)
Collecting locket
Downloading locket-0.2.1-py2.py3-none-any.whl (4.1 kB)
Installing collected packages: locket, partd, fsspec, cloudpickle, dask
Successfully installed cloudpickle-1.6.0 dask-2021.8.1 fsspec-2021.7.0 locket-0.2.1 partd-1.2.0
WARNING: You are using pip version 21.2.2; however, version 21.2.4 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
Reading Data Via Dask
Time to Load Data is: 0.020000934600830078 seconds
577703
Visualising your SQL on-the-go with SQL Cells, Plotly and Viz Cells 📈
SQL Cells on-the-go 🏃
SELECT COUNT(*) as "Number_Of_Trips",
"usertype" as "Customer",
HOUR(starttime) as "Hour"
FROM '/work/201306-citibike-tripdata.csv'
GROUP BY "Hour", "Customer"
ORDER BY "Customer","Number_Of_Trips"
SELECT COUNT(*) as "Number_Of_Trips",
"usertype" as "Customer",
DAYNAME(starttime) as "Day"
FROM '/work/201306-citibike-tripdata.csv'
GROUP BY "Day", "Customer"
ORDER BY "Customer","Number_Of_Trips"