6 One Sample Hypothesis Testing

Concepts

The Testing

Typically, α is set to 0.05 or 0.01. If α = 0.05, it would erroneously reject H₀ when H₀ is true 5% of the time.

Test Statistics

So far, we have covered steps for hypothesis testing. Here summarizes the test statistic for different cases.

Z Test Statistic

Normal Distribution

Z-statistic can be used to test normal distribution’s parameter μ:

Note:

Binomial Distribution

Z-statistic can be used to test binomial distribution’s parameter p:

Note:

T Test Statistic

Z-statistic does not work when variance σ² is unknown. Instead, T-statistic can be used to test normal distribution’s parameter u:

Note:

Hypothesis Testing Using Confidence Intervals

Confidence intervals use T-transformation to estimate population mean interval from sample mean and variance with 100(1 - α)% probability. This is essentially the same as the T-statistic:

ci_hypothesis

Pearson Chi-square Test Statistic

Z-statistic and T-Statistic are parametric statistics:

Chi-square test concerns with categorical data:

Test Type Null Hypothesis Example
Goodness-of-Fit The observed data follows the expected distribution Roll a 6-sided die 60 times. H₀: the die is fair, each side has probability 16
Test of Independence Two variables are independent Survey people on gender and voting preference. H₀: gender and voting preference are not associated
Test of Homogeneity The distributions of the categorical variable are the same across groups Test whether 3 different cities have the same preference distribution for soda brands

The pearson chi-square statistic:

chisquare_statisitc

Note:

Assumptions required for using the Pearson chi-square test:

Pearson Chi-square Statistic Follows Chi-square Distribution?

Transform the original statistic

𝒳²(n) = ∑((Oᵢ - Eᵢ)²/Eᵢ)    for 1 ≤ i ≤ n
      = ∑((Oᵢ - Eᵢ)/sqrt(Eᵢ))²

Based on chi-square’s property, if we can prove Zᵢ = (Oᵢ - Eᵢ)/sqrt(Eᵢ) ~ N(0, 1), then ∑Zᵢ² follows chi-square distribution.

Key idea: each observed count Oᵢ is a random variable that follows a binomial or multinomial
distribution under the hypothesis.

Oᵢ is essentially the number of successes in n trials with probability fo success pᵢ.
For a large enough n, the binomial variable Oᵢ is approximately:
    Oᵢ ~ N(Npᵢ, Npᵢqᵢ) = N(Eᵢ, Eᵢqᵢ).
So:
    (Oᵢ - Eᵢ) / sqrt(Eᵢqᵢ) = Zᵢ / sqrt(qᵢ) ~ N(0, 1)

qᵢ is not necessarily close to 1, how could Zᵢ approximates to standard normal distribution???
An extreme case is a 2 categorical data with q₀ = 0.99 and q₁ = 0.01.
Z₀ is close to standard normal distribution, but apparently Z₁ is NOT!

To answer the above question, it requires a quite lengthy proof that can be found here, a course material from MIT.

Fisher’s Exact Test

When samples are small, the distributions of 𝒳² (and other large-sample based statistics) are not well approximated by the chi-squared distribution. Their p-values are not to be trusted. In such situations, we can perform inference using an exact distribution (or estimates of exact distributions), but we should keep in mind that p-values based on exact tests can be conservative (i.e, measured to be larger than they really are).

We may use an exact test if:

Hypergeometric Distribution

The Fisher’s exact test can calculate probability using hypergeometric distribution. The table shown maps a problem in question to the corresponding hypergeometric statement, given fixed N, m, n.

Successes Failures Total
Sampled x = a n-x = b n
Not Sampled m-x = c N-m-n+x = d N-n
Total m N-m N

Hypergeometric probability gives chance of getting this exact table. With a simple transformation, we can get the probabilty of getting the exact table:

p(X = x) = ((a+b)! * (c+d)! * (a+c)! * (b+d)!) / (a! * b! * c! * d! * N!)

P-Value

The p-value is calculated as ∑p(X ≤ x) or ∑p(X ≥ x) depending on the definition of extreme. It represents the probability of observing the data if the null hypothesis were true. If the alternative hypothesis is really about the “Sampled” action causes more “Successes” (hence are associated), then it should use ∑p(X ≥ x). A small p-value means, if the null hypothesis were true, the observed data (or more extreme) is unlikely to have occurred by random chance. But the table data is observed. This leads to the rejection of the null hypothesis and suggests a statistically significant association between variables.

Examples

Example 1 (One-Sided Z-Statistic with One Sample)

One researcher believes a coin is “fair,” the other believes the coin is biased toward heads.
The coin is tossed 20 times, yielding 15 heads. Indicate whether or not the first researcher’s
position is supported by the results. Use α = .05.

Solution:
If the coin is fair, p = 0.5.
When n is large, we can use X ~ N(np, npq) = N(10, 5)
(1) Define hypotheses:
    H₀: E(X) = 10 heads
    H₁: E(X) > 10 heads
(2) Use Z = (number of heads ± 0.5 - 10) / sqrt(5/20) as the test statistic.
(3) Acceptance region is p(Z ≤ z) = 0.95 = 1 - α.
    So z = 1.65
    Reject H₀ if z > 1.65
(4) z = (15 - 0.5 - 10) / 0.5 = 2.01. H₀ is rejected.

Note: ± 0.5 in step 2 is a correction for continuity since X is not continuous.
To do this, add 0.5 to x when x < Np, and subtract 0.5 from x when x > Np.

Example 2 (Two-Sided Z-Statistic with One Sample)

Design a decision rule to test the hypothesis that a coin is fair if a sample of 64 tosses of
the coin is taken with a level of significance of 0.05.

Solution:
If the coin is fair, p = 0.5.
When n is large, we can use X ~ N(np, npq) = N(32, 16)
(1) Define hypotheses:
    H₀: E(X) = 32 heads
    H₁: E(X) ≠ 32 heads
(2) Use Z = (number of heads ± 0.5 - 32) / sqrt(16) as the test statistic.
(3) Acceptance region is p(-z ≤ Z ≤ z) = 0.95 = 1 - α.
    So z = 1.96
    Reject H₀ if z < 1.96 or z > 1.96

Note how the examples above use the Z-statistic without the sqrt(n) in denominator. When converting the original n Bernoulli trials into a binomial distribution, all tosses are combined to become one binomial observation of x successes. Therefore, sqrt(1) is ignored and the used formula is still a Z-statistic.

Example 3 (Z-Statistic with Multiple Samples)

A manufacture of steel rods considers that the manufacturing process is working properly if
the mean length of the rods is 8.6. The standard deviation of these rods always runs about
0.3 inches. Suppose a random sample of size n = 36 yields an average length of 8.7 inches.
Should the manufacturer conclude the process is working properly or improperly?

Solution:
Sample size n is large enough to consider X̄ as a normal distribution.
Since population μ and σ² are known, Z-statistic can be used.
(1) Define hypotheses:
    H₀: E(X) = μ₀ = 8.6
    H₁: E(X) ≠ 8.6
(2) Use Z = (X̄ - μ₀) / (σ / sqrt(n)) ~ N(0, 1) as the test statistic
(3) Acceptance region is p(-z ≤ Z ≤ z) = 0.95 = 1 - 2 * α.
    So z = 1.96
    Reject H₀ if z < 1.96 or z > 1.96
(4) z = (8.7 - 8.6) / (0.3 / sqrt(36)) = 2
    The null hypothesis H₀ should be rejected with level of significance 0.05.

Example 4 (T-Statistic)

A Bowler claims that she has a 215 average. In her latest performance, she scores 188,
214, and 204. Would you conclude the bowler is “off her game?”

Solution:
Sample size n = 3.
Population variance is unknown. So T-statisic can be used.
x̄ = (188 + 214 + 204) / 3 = 202
s² = ((188 - 202)² + (214 -202)² + (204 - 202)²) / (3 - 1)
   = 172
(1) Define hypotheses:
    H₀: E(X) = μ₀ = 215
    H₁: E(X) ≠ 215
(2) Use T = (X̄ - μ₀) / (s / sqrt(n)) ~ t-distribution with α and degree of freedoms 2.
(3) Critical region is p(-t ≤ T ≤ t) = 0.05 = α.
    So t = 4.303
    Reject H₀ if t < -4.303 or t > 4.303
(4) t = (202 - 215) / sqrt(172 / 3) = -1.717
    The null hypothesis H₀ cannot be rejected with level of significance 0.05.

Example 5 (Chi-square-Statistic for Goodness-of-Fit Test)

Below are the results of rolling a die 60 times. Do you consider the die fair?

    |  Face | Observed Frequency |
    | ----- | ------------------ |
    | 1     | 8                  |
    | 2     | 9                  |
    | 3     | 10                 |
    | 4     | 11                 |
    | 5     | 12                 |
    | 6     | 10                 |
    | Total | 60                 |

Solution:
H₀: the die is fair, each face has probability 1/6
H₁: the die is not fair (at least one face has a difference probability)

Under H₀ each face should occur 60 / 6 = 10 times, i.e. Eᵢ = 10

Compute chi-square statistic:
𝒳²(n) = ∑((Oᵢ - Eᵢ)²/Eᵢ)
      = (8 - 10)²/10 + (9 - 10)²/10 ... + (10 - 10)²/10
      = 1.0

Degrees of freedom (df) = number of categories - 1
                        = 6 - 1
                        = 5

For 𝛼 = 0.05 and df = 5, 𝑥² = 11.070
Since 𝒳²(n) ≪ 𝑥², H₀ can NOT be rejected.
We consider the die is fair.

Example 6 (Chi-square-Statistic for Test of Independence)

Here is a survey of 100 people about their gender and voting preference (Party A or Party B):

    |              | Party A | Party B | Row Total |
    | ------------ | ------- | ------- | --------- |
    | Male         | 20      | 30      | 50        |
    | Female       | 30      | 20      | 50        |
    | Column Total | 50      | 50      | 100       |

Do you consider gender and voting preference are independent?

Solution:
H₀: gender and voting preference are independent.
H₁: gender and voting are associated.

Compute expected counts for each table cell:
    Eᵢⱼ = (Row Totalᵢ) * (Column Totalⱼ) / Total

    |        |    Party A   |    Party B   |
    | ------ | ------------ | ------------ |
    | Male   | 50*50/100=25 | 50*50/100=25 |
    | Female | 50*50/100=25 | 50*50/100=25 |

Compute chi-square statistic:
𝒳² = ∑((Oᵢⱼ - Eᵢⱼ)²/Eᵢⱼ)
   = (20 - 25)²/25 + (30 - 25)²/25 ... + (20 - 25)²/25
   = 4.0

Degrees of freedom (df) = (rows - 1) * (columns - 1) = 1

For 𝛼 = 0.05 and df = 1, 𝑥² = 3.841
Since 𝒳²(n) > 𝑥², we can reject H₀ and consider gender and voting are NOT independent.

Example 7 (Chi-square-Statistic for Test of Homogeneity)

Here is a survey of voters in 3 cities about their preferred political party: A, B, or C.

    | City          | Party A | Party B | Party C | Row Total |
    | ------------- | ------- | ------- | ------- | --------- |
    | City 1        | 30      | 10      | 10      | 50        |
    | City 2        | 20      | 20      | 10      | 50        |
    | City 3        | 10      | 30      | 10      | 50        |
    | Column Totals | 60      | 60      | 30      | 150       |

Do you consider these 3 cities have the same party preference distribution?

Solution:
H₀: all cities have the same party preference distribution.
H₁: at least one city's party preference distribution is different.

Compute expected counts for each table cell:
    Eᵢⱼ = (Row Totalᵢ) * (Column Totalⱼ) / Total

    | City   |    Party A   |    Party B   |    Party C   |
    | ------ | ------------ | ------------ | ------------ |
    | City 1 | 50*60/150=20 | 50*60/150=20 | 50*30/150=10 |
    | City 2 | 50*60/150=20 | 50*60/150=20 | 50*30/150=10 |
    | City 3 | 50*60/150=20 | 50*60/150=20 | 50*30/150=10 |

Compute chi-square statistic:
𝒳² = ∑((Oᵢⱼ - Eᵢⱼ)²/Eᵢⱼ)
   = (30 - 20)²/20 + (10 - 20)²/20 + (10 - 10)²/10... + (10 - 10)²/25
   = 20.0

Degrees of freedom (df) = (rows - 1) * (columns - 1) = 4

For 𝛼 = 0.05 and df = 4, 𝑥² = 9.488
Since 𝒳²(n) > 𝑥², we can reject H₀ and consider not all cities share the same distribution
of party preference.

Example 8 (Fisher’s Exact Test)

Here is a clinical trial tests a vaccine on a small group.

    |         | Infected | Not Infected | Total |
    | ------- | -------- | ------------ | ----- |
    | Vaccine | 1        | 9            | 10    |
    | Placebo | 8        | 2            | 10    |
    | Total   | 9        | 11           | 20    |

Do you consider infection status is associated with receiving the vaccine?

Solution:
H₀: Vaccination and infection are independent (vaccine does not reduce infection).
H₁: Vaccination and infection are associated.

p-value = p(X = 1) + p(X = 0)
        = 0.0268 + 0.0006
        = 0.0275
Since p-value < 0.05, the chance of observing extreme cases should be rare.
But we observed it.
H₀ is rejected, i.e. vaccine appears effective in reducing infection.