a) Use the seaborn library with default parameters to visualize the distribution of those values using Kernel Density Estimation. Based on the visualization, at which location(s) do you think the underlying distribution has a mode (peak)?
The peak of the distribution is between 1.8 and 2.
b) Produce alternative KDE plots by adjusting the bandwidth to higher and lower values. Briefly describe in your own words how this changes the shape of the estimated distribution. Visualize the same data in a different way to help you decide which setting most faithfully reflects the distribution which generated the data. In particular, at which location(s) would you assume it has modes? (3P)
With bandwidth 0.1, A small bandwidth leads to under-smoothing, and thus the graph looks like a combination of multiple individual peaks. Here we have 7. The curve has strong spikes.
With bandwidth increases to 0.25, the curves smoothes and now looks like a combination of 4 multiple peaks. The curve started to merge here and the highest peak is around 2.
With bandwidth increases to 0.5, in the curve the second and third peaks are almost merged.
With bandwidth increases to 0.75, the curves more merged and looks like part of a single distribution. the middle peak is highest around 2.
With bandwidth increases to 0.9, the curves smoothes further and now looks like a combination of three peaks. These are more merged and looks like part of a single distribution. Here also the peak is around 2
at bandwidth 1, no signs of multimodality, we have a wide and smooth unimodal distribution.
at bandwidth 1.5, the curve has perfect unimodal distribution. A big bandwidth can lead to over-smoothing. It means that the density plot look like a unimodal distribution and hide all non-unimodal distribution properties.
We think the bandwidth of 0.75 is the optimal one as it avoids over-smoothing and under-smoothing and reflects underlying properties better. The modes are at around 0, 2 and 3.
d) Read the dataset
Use pandas.melt to transform the dataform wide to long format
e) create two boxplots side-by-side