Creating a Clinical Trials Dataset

About Me

Hi, my name is Neel Pai, and I’m an undergraduate math student at Vanderbilt University. I’m hoping to expand my experience with data science, particularly with Python’s statistical packages. I’m excited to share the work that I’ve accomplished with you!

Project Summary

Clinical trials are crucial to determining the efficacy of experimental treatments being developed at the frontier of the healthcare industry. However, they also require significant amounts of planning in order to be effectively executed. Being able to properly match individuals to clinical trials is instrumental in filling up trials with eligible participants, as well as matching individuals in need to potentially life-saving drugs. However, there is no publicly available database containing clinical trials, as well as the information of the clinical providers that conducted these trials. The goal of this notebook is to create a comprehensive dataset of clinical trials and the associated clinicians that have conducted these trials. In this project, I completed an exploratory data analysis, cleaned and merged datasets from multiple sources, and created visualizations of the key insights that the dataset provides. This final output gives us a queryable dataset of over 62,000 clinical trials that can be used to identify doctors for future trials.

df_all.head(n = 25) #merged dataset

matched = df_all.shape[0] trials = trialdata.shape[0] print ("Number of Matched Trials : ", matched) print ("Total Number of Trials : ", trials) percentage = "{:.0%}". format(np.divide(matched,trials)) print ("Percent Matched: ",percentage)

Which states conduct the most clinical trials?

top_states = sns.countplot(x=' st', data=df_all, order=df_all[' st'].value_counts()[:10].index) top_states.set_xlabel("States") top_states.set_ylabel("Frequency") top_states.set_title("States with Most Clinical Trials") sns.set(rc = {'figure.figsize':(15,8)})

What organizations most commonly conduct clinical trials?

top_orgs = sns.countplot(x=' org_nm', data=df_all, order=df_all[' org_nm'].value_counts()[:3].index) top_orgs.set_xlabel("Organizations") top_orgs.set_ylabel("Frequency") top_orgs.set_title("Organizations with Most Clinical Trials") sns.set(rc = {'figure.figsize':(15,8)})

What are the most common specializations that conduct clinical trials?

specs_df = pd.DataFrame(df_all[' pri_spec'].value_counts().nlargest(10)) spec_plot = specs_df.plot.pie(subplots = True, autopct='%1.1f%%', figsize = (7,7));

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}Creating a Clinical Trials Dataset

About Me

Project Summary

Creating a Clinical Trials Dataset