Creating a Clinical Trials Dataset
About Me
Hi, my name is Neel Pai, and I’m an undergraduate math student at Vanderbilt University. I’m hoping to expand my experience with data science, particularly with Python’s statistical packages. I’m excited to share the work that I’ve accomplished with you!
Project Summary
Clinical trials are crucial to determining the efficacy of experimental treatments being developed at the frontier of the healthcare industry. However, they also require significant amounts of planning in order to be effectively executed. Being able to properly match individuals to clinical trials is instrumental in filling up trials with eligible participants, as well as matching individuals in need to potentially life-saving drugs. However, there is no publicly available database containing clinical trials, as well as the information of the clinical providers that conducted these trials. The goal of this notebook is to create a comprehensive dataset of clinical trials and the associated clinicians that have conducted these trials. In this project, I completed an exploratory data analysis, cleaned and merged datasets from multiple sources, and created visualizations of the key insights that the dataset provides. This final output gives us a queryable dataset of over 62,000 clinical trials that can be used to identify doctors for future trials.
36
1928894
NCT03921281
38
1143508
NCT03703635
39
1143518
NCT03703635
40
1200303
NCT02951169
41
1205675
NCT02833103
Number of Matched Trials : 64244
Total Number of Trials : 221916
Percent Matched: 29%
Which states conduct the most clinical trials?
What organizations most commonly conduct clinical trials?
What are the most common specializations that conduct clinical trials?