coin3 <- data.frame(
N_Heads = c(0, 1, 2, 3),
C_Heads = c(1, 3, 3, 1)
)
attach(coin3)
coin3$P_Heads <- C_Heads/sum(C_Heads)
options(repr.plot.width=7, repr.plot.height=4)
barplot(coin3$P_Heads, names = coin3$N_Heads, main = "Discrete Probability Distribution of flipping three coins")
# Optional for advanced learners
barplot(prop.table(table(c('HHH', 'HHT', 'HHT','HTT','HHT','HTT','HTT', 'TTT'))),
main = "Discrete Probability Distribution of tossing three coins",
xlab = "x = Number of Heads", ylab = "P(x)", col = "#6633FF")
attach(coin3)
E_X = sum(N_Heads * P_Heads)
E_X
normal_dist <- function(x, b){
mu = mean(x)
sigma = sd(x)
x1 <- seq(mu - 6*sigma, mu + 6*sigma, length = 100)
# Normal curve
fun <- dnorm(x1, mean = mu, sd = sigma)
# Histogram
options(repr.plot.width=7, repr.plot.height=4)
hist(x, prob = TRUE, col = "white", breaks = b,
xlim = c(mu - 6*sigma, mu + 6*sigma),
ylim = c(0, max(fun)),
main = "Histogram overlayed with normal curve")
lines(x1, fun, col = 2, lwd = 2)
}
data(chickwts)
attach(chickwts)
normal_dist(chickwts$weight,10)
shapiro.test(chickwts$weight)
# Is the approximation normal?
cat("The data can be approximated as normal distribution since p >", round(shapiro.test(chickwts$weight)$p.value, 2))
chickwts$zscore = (weight - mean(weight))/sd(weight)
tail(chickwts)
GRE scores, Part I. Sophia who took the Graduate Record Examination (GRE) scored 160 on the Verbal Reasoning
section and 157 on the Quantitative Reasoning section. The mean score for Verbal Reasoning section for all
test takers was 151 with a standard deviation of 7, and the mean score for the Quantitative Reasoning was
153 with a standard deviation of 7.67. Suppose that both distributions are nearly normal. (A-H down below)
A. Write down the short-hand for these two normal distributions.
Solution
a. NVerbal:(μ=151,σ=7) NQuantitativeReasoning:(μ=153,σ=7.67
(b) What is Sophia's Z-score on the Verbal Reasoning section? On the Quantitative Reasoning section?
#b.
z_sophiaV = (160 - 151)/7
z_sophiaQ = (157 - 153)/7.67
cat("Z Score for Verbal= ", round(z_sophiaV,2), "\nZ Score for Quantitative Reasoning= ", round(z_sophiaQ, 2))
(c) What do these Z-scores tell you? She scored 1.29 standard deviations above the mean on the Verbal
Reasoning section and 0.52 standard deviations above the mean on the Quantitative Reasoning section.
(d) Relative to others, which section did she do better on? She did better on the Verbal Reasoning section
since her Z score on that section was higher.
(e) Find her percentile scores for the two exams.
(f) What percent of the test takers did better than her on the Verbal Reasoning section? On the Quantitative
Reasoning section?
# e.
cat("The verbal ability percentile score is ",round(pnorm(1.29, lower.tail = TRUE),4)*100, "%\n")
cat("The quantitative reasoning percentile score is ",round(pnorm(0.52, lower.tail = TRUE),4)*100, "%\n")
# f.
cat(100 - round(pnorm(1.29, lower.tail = TRUE),4)*100, "%", "did better than Sophia in verbal ability\n")
cat(100 - round(pnorm(0.52, lower.tail = TRUE),4)*100, "%", "did better than Sophia in quantitative reasoning")
g. Explain why simply comparing raw scores from the two sections could lead to an incorrect conclusion as
to which section a student did better on? We cannot compare the raw scores since they are on different scales. Her scores will be
measured relative to the merits of other students on each exam, so it is helpful to consider the
Z score. Comparing her percentile scores is a more appropriate way of determining how well
she did compared to others taking the exams.
h. If the distributions of the scores on these exams are not nearly normal, would youranswers to parts
(b to (f) change? Explain your reasoning. Answer to part (b) would not change as Z scores can be calculated for distributions that are
not normal. However, we could not answer parts (c)-(f) since we cannot use the Z table to
calculate probabilities and percentiles without a normal model.