Building a Dataset
Hi! My name is Phoebe Young, and I am a neuroscience and cognitive studies double major at Vanderbilt University. I am currently a sophomore working in the Winder lab at Vanderbilt Center for Addiction Research (VCAR) and participating in a micro-internship with Open Avenues Foundation and Tamr. My specialty primarily consists of data analysis, as I have had experience with data analysis in a cognitive studies lab and now with a software company.
This project began with the goal of creating a dataset to manage and ask questions about doctors involved in clinical trials research. We were tasked with cleaning multiple datasets to then be merged for a collection of information to be used to identify the ideal doctor for a given task. Using deepnote's servers and pandas, the data was cleaned, columns were edited and dropped, full datasets were merged, and that information was converted to a visually appealing and more easily interpreted chart. From this, an interactive notebook was created so that any person can input their criteria to find the best match for their clinical trial.
Merged dataset for clinical trial information:
Contains more clearly organized and succinct information to distinguish doctors by certain qualifications
0
8790702
NCT03724162
1
8790703
NCT03724162
2
8790704
NCT03723954
3
8790705
NCT03723668
4
8790706
NCT03723668
5
8790707
NCT03723538
6
8790708
NCT03723486
7
8790709
NCT03723369
8
8790710
NCT03723278
9
8790711
NCT03723200
Final merged dataset of Clinical Trial and Medicare information: contains combination of Medicare information and information from the original dataset
0
1215321872
3870667793
1
1215195664
6204988652
2
1215188925
5193982478
3
1215138714
8022188820
4
1215127840
8527950369
5
1215349154
2365764834
6
1215186309
2163654658
7
1215254925
7911022066
8
1215284070
2860645439
9
1215161161
6608197421
Find doctors of a specific gender:
0
1215321872
3870667793
1
1215195664
6204988652
4
1215127840
8527950369
5
1215349154
2365764834
6
1215186309
2163654658
Gender Distribution of Doctors
Top 10 Most Popular Primary Specialties in the Dataset
AxesSubplot(0.260833,0.125;0.503333x0.755)
AxesSubplot(0.260833,0.125;0.503333x0.755)
Find doctors in 10 states with the most clinical trials:
12
1215161914
4486080215
15
1215195029
2769456896
17
1215144167
9234321423
27
1215138391
5991895575
34
1215138391
5991895575
37
1215191846
6204090988
41
1215149240
2264618925
46
1215248273
244479954
59
1215262498
7315108651
74
1215307954
1951695923