!pip install covidcast
Collecting covidcast
Downloading covidcast-0.1.5-py3-none-any.whl (12.3 MB)
|████████████████████████████████| 12.3 MB 8.3 MB/s
Requirement already satisfied: matplotlib in /shared-libs/python3.7/py/lib/python3.7/site-packages (from covidcast) (3.4.3)
Collecting geopandas
Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
|████████████████████████████████| 1.0 MB 19.2 MB/s
Requirement already satisfied: numpy in /shared-libs/python3.7/py/lib/python3.7/site-packages (from covidcast) (1.19.5)
Collecting descartes
Downloading descartes-1.1.0-py3-none-any.whl (5.8 kB)
Requirement already satisfied: requests in /shared-libs/python3.7/py/lib/python3.7/site-packages (from covidcast) (2.26.0)
Collecting imageio-ffmpeg
Downloading imageio_ffmpeg-0.4.5-py3-none-manylinux2010_x86_64.whl (26.9 MB)
|████████████████████████████████| 26.9 MB 29.3 MB/s
Requirement already satisfied: pandas in /shared-libs/python3.7/py/lib/python3.7/site-packages (from covidcast) (1.2.5)
Collecting imageio
Downloading imageio-2.13.0-py3-none-any.whl (3.3 MB)
|████████████████████████████████| 3.3 MB 9.3 MB/s
Collecting delphi-epidata>=0.0.11
Downloading delphi_epidata-0.3.1-py3-none-any.whl (6.8 kB)
Requirement already satisfied: tqdm in /shared-libs/python3.7/py/lib/python3.7/site-packages (from covidcast) (4.62.3)
Collecting epiweeks
Downloading epiweeks-2.1.3-py3-none-any.whl (5.9 kB)
Requirement already satisfied: pyparsing>=2.2.1 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from matplotlib->covidcast) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from matplotlib->covidcast) (0.11.0)
Requirement already satisfied: pillow>=6.2.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from matplotlib->covidcast) (8.4.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from matplotlib->covidcast) (1.3.2)
Requirement already satisfied: python-dateutil>=2.7 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from matplotlib->covidcast) (2.8.2)
Collecting pyproj>=2.2.0
Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
|████████████████████████████████| 6.3 MB 24.0 MB/s
Requirement already satisfied: fiona>=1.8 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from geopandas->covidcast) (1.8.20)
Collecting shapely>=1.6
Downloading Shapely-1.8.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)
|████████████████████████████████| 1.1 MB 38.3 MB/s
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->covidcast) (1.26.7)
Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from requests->covidcast) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from requests->covidcast) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from requests->covidcast) (3.3)
Requirement already satisfied: pytz>=2017.3 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from pandas->covidcast) (2021.3)
Requirement already satisfied: tenacity in /shared-libs/python3.7/py/lib/python3.7/site-packages (from delphi-epidata>=0.0.11->covidcast) (8.0.1)
Requirement already satisfied: aiohttp in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from delphi-epidata>=0.0.11->covidcast) (3.8.0)
Requirement already satisfied: six>=1.5 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from python-dateutil>=2.7->matplotlib->covidcast) (1.16.0)
Requirement already satisfied: click-plugins>=1.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from fiona>=1.8->geopandas->covidcast) (1.1.1)
Requirement already satisfied: munch in /shared-libs/python3.7/py/lib/python3.7/site-packages (from fiona>=1.8->geopandas->covidcast) (2.5.0)
Requirement already satisfied: setuptools in /root/venv/lib/python3.7/site-packages (from fiona>=1.8->geopandas->covidcast) (47.1.0)
Requirement already satisfied: click>=4.0 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from fiona>=1.8->geopandas->covidcast) (8.0.3)
Requirement already satisfied: cligj>=0.5 in /shared-libs/python3.7/py/lib/python3.7/site-packages (from fiona>=1.8->geopandas->covidcast) (0.7.2)
Requirement already satisfied: attrs>=17 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from fiona>=1.8->geopandas->covidcast) (21.2.0)
Requirement already satisfied: asynctest==0.13.0; python_version < "3.8" in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from aiohttp->delphi-epidata>=0.0.11->covidcast) (0.13.0)
Requirement already satisfied: frozenlist>=1.1.1 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from aiohttp->delphi-epidata>=0.0.11->covidcast) (1.2.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from aiohttp->delphi-epidata>=0.0.11->covidcast) (5.2.0)
Requirement already satisfied: typing-extensions>=3.7.4; python_version < "3.8" in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from aiohttp->delphi-epidata>=0.0.11->covidcast) (3.10.0.2)
Requirement already satisfied: aiosignal>=1.1.2 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from aiohttp->delphi-epidata>=0.0.11->covidcast) (1.2.0)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from aiohttp->delphi-epidata>=0.0.11->covidcast) (4.0.1)
Requirement already satisfied: yarl<2.0,>=1.0 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from aiohttp->delphi-epidata>=0.0.11->covidcast) (1.7.2)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from click>=4.0->fiona>=1.8->geopandas->covidcast) (4.8.2)
Requirement already satisfied: zipp>=0.5 in /shared-libs/python3.7/py-core/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->click>=4.0->fiona>=1.8->geopandas->covidcast) (3.6.0)
Installing collected packages: pyproj, shapely, geopandas, descartes, imageio-ffmpeg, imageio, delphi-epidata, epiweeks, covidcast
Successfully installed covidcast-0.1.5 delphi-epidata-0.3.1 descartes-1.1.0 epiweeks-2.1.3 geopandas-0.10.2 imageio-2.13.0 imageio-ffmpeg-0.4.5 pyproj-3.2.1 shapely-1.8.0
WARNING: You are using pip version 20.1.1; however, version 21.3.1 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
from datetime import date
import covidcast
import pandas as pd
import numpy as np
ca_counties = covidcast.fips_to_name("^06.*", ties_method="all")
ca_counties = list(ca_counties[0].values())
counties_string = []
for i in ca_counties:
string = ""
for element in i:
string += element
counties_string.append(string)
counties_string = counties_string[1:59] # removing 'california'
ca_counties_fips = covidcast.name_to_fips(counties_string)
ca_counties_fips
/root/venv/lib/python3.7/site-packages/covidcast/geography.py:314: UserWarning: Some inputs were not uniquely matched; returning only the first match in each case. To return all matches, set `ties_method='all'`
warnings.warn("Some inputs were not uniquely matched; returning only the first match "
data = covidcast.signal("indicator-combination", "confirmed_incidence_num",
geo_values= google_sum_fips)
data.head()
data.tail()
labels = data['value']
# number of observations
labels.size
# looking for NA values for value column
data.isna().sum()
# many missing dates in month of november
chng = covidcast.signal("chng", "smoothed_outpatient_cli",
geo_values=google_sum_fips, start_day=date(2020, 2, 20), end_day=date(2021, 11, 12))
chng.head()
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211003 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211004 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211005 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211006 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211007 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211008 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211009 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211010 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211011 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211012 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211013 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211014 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211015 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211016 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211017 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211018 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211019 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211020 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211021 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211022 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211023 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211024 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211025 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211026 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211027 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211028 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211029 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211030 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211031 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211101 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211102 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211103 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211104 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211105 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211106 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211107 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211108 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211109 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211110 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211111 for geography 'county'
NoDataWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:425: NoDataWarning: No chng smoothed_outpatient_cli data found on 20211112 for geography 'county'
NoDataWarning)
chng.tail()
chng.shape
chng.isna().sum()
hosp = covidcast.signal("hospital-admissions", "smoothed_covid19_from_claims",
geo_values=google_sum_fips, start_day=date(2020, 2, 20), end_day=date(2021, 11, 12))
hosp.head()
hosp.tail()
hosp.shape
google_sum = covidcast.signal("google-symptoms", "sum_anosmia_ageusia_raw_search",
geo_values=google_sum_fips, start_day=date(2020, 2, 20), end_day=date(2021, 11, 12))
google_sum.head()
google_ageusia = covidcast.signal("google-symptoms", "ageusia_raw_search",
geo_values=google_sum_fips, start_day=date(2020, 2, 20), end_day=date(2021, 11, 12))
google_ageusia.head()
google_anosmia = covidcast.signal("google-symptoms", "anosmia_raw_search",
geo_values=google_sum_fips, start_day=date(2020, 2, 20), end_day=date(2021, 11, 12))
google_anosmia.head()
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:429: RuntimeWarning: Problem obtaining google-symptoms anosmia_raw_search data on 20210528 for geography 'county': error: Expecting value: line 1 column 1 (char 0)
RuntimeWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:429: RuntimeWarning: Problem obtaining google-symptoms anosmia_raw_search data on 20210530 for geography 'county': error: Expecting value: line 1 column 1 (char 0)
RuntimeWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:429: RuntimeWarning: Problem obtaining google-symptoms anosmia_raw_search data on 20210531 for geography 'county': error: Expecting value: line 1 column 1 (char 0)
RuntimeWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:429: RuntimeWarning: Problem obtaining google-symptoms anosmia_raw_search data on 20210607 for geography 'county': error: Expecting value: line 1 column 1 (char 0)
RuntimeWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:429: RuntimeWarning: Problem obtaining google-symptoms anosmia_raw_search data on 20210614 for geography 'county': error: Expecting value: line 1 column 1 (char 0)
RuntimeWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:429: RuntimeWarning: Problem obtaining google-symptoms anosmia_raw_search data on 20210616 for geography 'county': error: Expecting value: line 1 column 1 (char 0)
RuntimeWarning)
/root/venv/lib/python3.7/site-packages/covidcast/covidcast.py:429: RuntimeWarning: Problem obtaining google-symptoms anosmia_raw_search data on 20210618 for geography 'county': error: Expecting value: line 1 column 1 (char 0)
RuntimeWarning)
# checking to see if counties match up in different signals
fips = []
for i in ageusia_fips:
if i in hosp_fips:
fips.append(i)
doc = covidcast.signal("doctor-visits", "smoothed_cli",
geo_values=google_sum_fips, start_day=date(2020, 2, 20), end_day=date(2021, 11, 12))
doc.head()
doc.tail()
np.array(doc).shape
doc.isna().sum()
merged = covidcast.aggregate_signals([hosp, chng, doc, google_sum, google_anosmia, google_ageusia, data])
merged
merged.columns
x = merged.drop(columns = [
'hospital-admissions_smoothed_covid19_from_claims_0_issue',
'hospital-admissions_smoothed_covid19_from_claims_0_lag',
'hospital-admissions_smoothed_covid19_from_claims_0_missing_value',
'hospital-admissions_smoothed_covid19_from_claims_0_missing_stderr',
'hospital-admissions_smoothed_covid19_from_claims_0_missing_sample_size',
'hospital-admissions_smoothed_covid19_from_claims_0_stderr',
'hospital-admissions_smoothed_covid19_from_claims_0_sample_size',
'chng_smoothed_outpatient_cli_1_issue',
'chng_smoothed_outpatient_cli_1_lag',
'chng_smoothed_outpatient_cli_1_missing_value',
'chng_smoothed_outpatient_cli_1_missing_stderr',
'chng_smoothed_outpatient_cli_1_missing_sample_size',
'chng_smoothed_outpatient_cli_1_stderr',
'chng_smoothed_outpatient_cli_1_sample_size',
'doctor-visits_smoothed_cli_2_issue',
'doctor-visits_smoothed_cli_2_lag',
'doctor-visits_smoothed_cli_2_missing_value',
'doctor-visits_smoothed_cli_2_missing_stderr',
'doctor-visits_smoothed_cli_2_missing_sample_size',
'doctor-visits_smoothed_cli_2_stderr',
'doctor-visits_smoothed_cli_2_sample_size',
'google-symptoms_sum_anosmia_ageusia_raw_search_3_issue',
'google-symptoms_sum_anosmia_ageusia_raw_search_3_lag',
'google-symptoms_sum_anosmia_ageusia_raw_search_3_missing_value',
'google-symptoms_sum_anosmia_ageusia_raw_search_3_missing_stderr',
'google-symptoms_sum_anosmia_ageusia_raw_search_3_missing_sample_size',
'google-symptoms_sum_anosmia_ageusia_raw_search_3_stderr',
'google-symptoms_sum_anosmia_ageusia_raw_search_3_sample_size',
'google-symptoms_anosmia_raw_search_4_issue',
'google-symptoms_anosmia_raw_search_4_lag',
'google-symptoms_anosmia_raw_search_4_missing_value',
'google-symptoms_anosmia_raw_search_4_missing_stderr',
'google-symptoms_anosmia_raw_search_4_missing_sample_size',
'google-symptoms_anosmia_raw_search_4_stderr',
'google-symptoms_anosmia_raw_search_4_sample_size',
'google-symptoms_ageusia_raw_search_5_issue',
'google-symptoms_ageusia_raw_search_5_lag',
'google-symptoms_ageusia_raw_search_5_missing_value',
'google-symptoms_ageusia_raw_search_5_missing_stderr',
'google-symptoms_ageusia_raw_search_5_missing_sample_size',
'google-symptoms_ageusia_raw_search_5_stderr',
'google-symptoms_ageusia_raw_search_5_sample_size',
'indicator-combination_confirmed_incidence_num_6_issue',
'indicator-combination_confirmed_incidence_num_6_lag',
'indicator-combination_confirmed_incidence_num_6_missing_value',
'indicator-combination_confirmed_incidence_num_6_missing_stderr',
'indicator-combination_confirmed_incidence_num_6_missing_sample_size',
'indicator-combination_confirmed_incidence_num_6_stderr',
'indicator-combination_confirmed_incidence_num_6_sample_size',
'geo_type'])
x
x.columns
x.isna().sum()
np.unique(x[x['google-symptoms_ageusia_raw_search_5_value'].isna()]['geo_value'])
# these are the unique counties that the ageusia column in the merged dataset is na for ?
x.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9480 entries, 0 to 9479
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 geo_value 9480 non-null object
1 time_value 9480 non-null datetime64[ns]
2 hospital-admissions_smoothed_covid19_from_claims_0_value 9117 non-null float64
3 chng_smoothed_outpatient_cli_1_value 8865 non-null float64
4 doctor-visits_smoothed_cli_2_value 9480 non-null float64
5 google-symptoms_sum_anosmia_ageusia_raw_search_3_value 8770 non-null float64
6 google-symptoms_anosmia_raw_search_4_value 8494 non-null float64
7 google-symptoms_ageusia_raw_search_5_value 7203 non-null float64
8 indicator-combination_confirmed_incidence_num_6_value 9480 non-null float64
dtypes: datetime64[ns](1), float64(7), object(1)
memory usage: 998.7+ KB
min(x['indicator-combination_confirmed_incidence_num_6_value'])
# seeing negative values for our labels????
x[x['indicator-combination_confirmed_incidence_num_6_value'] < 0]
# converting all negative labels to positive
x['indicator-combination_confirmed_incidence_num_6_value'] = abs(x['indicator-combination_confirmed_incidence_num_6_value'])
# drop ageusia since missing a county
x = x.drop(columns = "google-symptoms_ageusia_raw_search_5_value")
x
x.isna().sum()
# imputing by forward filling based on previous observation in each county
updated_x = x
updated_x['hospital-admissions_smoothed_covid19_from_claims_0_value'] = x.groupby('geo_value')['hospital-admissions_smoothed_covid19_from_claims_0_value'].fillna(method='ffill')
updated_x['chng_smoothed_outpatient_cli_1_value'] = x.groupby('geo_value')['chng_smoothed_outpatient_cli_1_value'].fillna(method='ffill')
updated_x['google-symptoms_sum_anosmia_ageusia_raw_search_3_value'] = x.groupby('geo_value')['google-symptoms_sum_anosmia_ageusia_raw_search_3_value'].fillna(method='ffill')
updated_x['google-symptoms_anosmia_raw_search_4_value'] = x.groupby('geo_value')['google-symptoms_anosmia_raw_search_4_value'].fillna(method='ffill')
updated_x
updated_x.isna().sum()
# drop remaining NA values
updated_x = updated_x.dropna(axis = 0)
updated_x.isna().sum()
updated_x
updated_x.to_csv(r'data_preparation.csv')