C02_Random_Variable

coin3 <- data.frame( N_Heads = c(0, 1, 2, 3), C_Heads = c(1, 3, 3, 1) ) attach(coin3) coin3$P_Heads <- C_Heads/sum(C_Heads)

options(repr.plot.width=7, repr.plot.height=4) barplot(coin3$P_Heads, names = coin3$N_Heads, main = "Discrete Probability Distribution of flipping three coins")

# Optional for advanced learners barplot(prop.table(table(c('HHH', 'HHT', 'HHT','HTT','HHT','HTT','HTT', 'TTT'))), main = "Discrete Probability Distribution of tossing three coins", xlab = "x = Number of Heads", ylab = "P(x)", col = "#6633FF")

attach(coin3) E_X = sum(N_Heads * P_Heads) E_X

normal_dist <- function(x, b){ mu = mean(x) sigma = sd(x) x1 <- seq(mu - 6*sigma, mu + 6*sigma, length = 100) # Normal curve fun <- dnorm(x1, mean = mu, sd = sigma) # Histogram options(repr.plot.width=7, repr.plot.height=4) hist(x, prob = TRUE, col = "white", breaks = b, xlim = c(mu - 6*sigma, mu + 6*sigma), ylim = c(0, max(fun)), main = "Histogram overlayed with normal curve") lines(x1, fun, col = 2, lwd = 2) }

data(chickwts) attach(chickwts) normal_dist(chickwts$weight,10)

shapiro.test(chickwts$weight)

# Is the approximation normal? cat("The data can be approximated as normal distribution since p >", round(shapiro.test(chickwts$weight)$p.value, 2))

chickwts$zscore = (weight - mean(weight))/sd(weight) tail(chickwts)

GRE scores, Part I. Sophia who took the Graduate Record Examination (GRE) scored 160 on the Verbal Reasoning section and 157 on the Quantitative Reasoning section. The mean score for Verbal Reasoning section for all test takers was 151 with a standard deviation of 7, and the mean score for the Quantitative Reasoning was 153 with a standard deviation of 7.67. Suppose that both distributions are nearly normal. (A-H down below)

A. Write down the short-hand for these two normal distributions.

Solution a. NVerbal:(μ=151,σ=7) NQuantitativeReasoning:(μ=153,σ=7.67

(b) What is Sophia's Z-score on the Verbal Reasoning section? On the Quantitative Reasoning section?

#b. z_sophiaV = (160 - 151)/7 z_sophiaQ = (157 - 153)/7.67 cat("Z Score for Verbal= ", round(z_sophiaV,2), "\nZ Score for Quantitative Reasoning= ", round(z_sophiaQ, 2))

(c) What do these Z-scores tell you? She scored 1.29 standard deviations above the mean on the Verbal Reasoning section and 0.52 standard deviations above the mean on the Quantitative Reasoning section.

(d) Relative to others, which section did she do better on? She did better on the Verbal Reasoning section since her Z score on that section was higher.

(e) Find her percentile scores for the two exams.

(f) What percent of the test takers did better than her on the Verbal Reasoning section? On the Quantitative Reasoning section?

# e. cat("The verbal ability percentile score is ",round(pnorm(1.29, lower.tail = TRUE),4)*100, "%\n") cat("The quantitative reasoning percentile score is ",round(pnorm(0.52, lower.tail = TRUE),4)*100, "%\n") # f. cat(100 - round(pnorm(1.29, lower.tail = TRUE),4)*100, "%", "did better than Sophia in verbal ability\n") cat(100 - round(pnorm(0.52, lower.tail = TRUE),4)*100, "%", "did better than Sophia in quantitative reasoning")

g. Explain why simply comparing raw scores from the two sections could lead to an incorrect conclusion as to which section a student did better on? We cannot compare the raw scores since they are on different scales. Her scores will be measured relative to the merits of other students on each exam, so it is helpful to consider the Z score. Comparing her percentile scores is a more appropriate way of determining how well she did compared to others taking the exams.

h. If the distributions of the scores on these exams are not nearly normal, would youranswers to parts (b to (f) change? Explain your reasoning. Answer to part (b) would not change as Z scores can be calculated for distributions that are not normal. However, we could not answer parts (c)-(f) since we cannot use the Z table to calculate probabilities and percentiles without a normal model.