6 One Sample Hypothesis Testing

6 One Sample Hypothesis Testing

Concepts

Population parameters are unknown. But we can make hypotheses about their true values.
The purpose of hypothesis testing is to choose between 2 competing hypotheses about the value of a population parameter.
- One hypothesis claims wages of men and women are equal.
- An alternative claims men make more than women.
Null hypothesis: the hypothesis actually to be tested, denoted as H₀.
- Null hypothesis is assumed to be true unless there is a strong evidence to the contrary.
Alternative hypothesis: the other hypothesis assumed to be true when null hypothesis is false, denoted as H₁ or HA
Convenient to have null hypothesis contain an equal sign, e.g. H₀: µ = 100, H₁: µ > 100.
The true value of the population parameter should be in either H₀ or H₁.
- One-sided test: if true value lies entirely above or below the value specified in H₀.
- two-sided test: if true value can lie on either side of the value specified in H₀.

The Testing

The standard procedure is to assume H₀ is true, and try to determine whether there is sufficient evidence to declare H₀ false.
Reject H₀ only when chance is small that H₀ is true.
- Type I error: reject the null hypothesis when the null is true. The probability of Type I error = 𝛼.
```
α = Probability of Type I error = p(rejecting H₀ | H₀ is true)
```
- Type II error: accept the null hypothesis when it is not true. The probability of Type II error = β.
```
β = Probability of Type II error = p(accepting H₀ | H₀ is false)
```
- α and β are not independent of each other. Increasing sample size n cause both to decrease.
Testing procedure:
- Define H₀ and H₁
- Determine the appropriate test statistic. A test statistic is a random variable used to determine how close a specific sample result falls to one of the hypotheses being tested.
- Determine the critical region that statisfies required α. Reject the null hypothesis if the computed test statistic is in the critical region. The opposite of critical region is the acceptance region.

Typically, α is set to 0.05 or 0.01. If α = 0.05, it would erroneously reject H₀ when H₀ is true 5% of the time.

Test Statistics

So far, we have covered steps for hypothesis testing. Here summarizes the test statistic for different cases.

Z Test Statistic

Normal Distribution

Z-statistic can be used to test normal distribution’s parameter μ:

X is a random variable of normal distribution with known variance σ² and unknown mean μ.
- Other types can use normal distribution to approximate with conditions.
- For example, even X is not normally distributed, its sample mean X̄ is normally distributed with large sample size n.
The null hypothesis is H₀: E(X) = μ₀

Note:

x̄ is the sampled mean.
μ₀ is the population mean.
Z ~ N(0, 1).
The denominator in the z-statistic is σ / sqrt(n) while the z-score’s denominator is σ. The reason is z-score is dealing with population variance, which is σ². Here the z-statistic is using sample variance as denominator, which is σ²/n.

Binomial Distribution

Z-statistic can be used to test binomial distribution’s parameter p:

X is random variable of binomial distribution with unknown success rate p.

According to the properties of sample variable,

E(X̄) = μ = p
V(X̄) = σ² / n = pq
Where u = p and σ² = npq are mean and variance of the binomial distribution sampled.

Based on Central Limit Theorem, X̄ ~ N(μ, σ²/n) = N(μ, pq/n) for large n.
The null hypothesis is H₀: p = p₀

Note:

CC represents the correction for continuity because X is a discrete random variable. Add 0.5/N if 𝗉̂ < p₀. Substract 0.5/N if 𝗉̂ > p₀. Do nothing if 𝗉̂ = p₀.
𝗉̂ is the sampled success rate.
q₀ = 1 - p₀.
Z ~ N(0, 1).
The formula is the same as the normal distribution case, except both numerator and denominator are scaled by N.

T Test Statistic

Z-statistic does not work when variance σ² is unknown. Instead, T-statistic can be used to test normal distribution’s parameter u:

X is a random variable of normal distribution with unknown variance σ² and mean μ.
Use sampled variance s² to replace population variance σ².
The null hypothesis is H₀: E(X) = μ₀

Note:

x̄ is the sampled mean.
T ~ t-distribution with 𝛼 and degree of freedoms n - 1 Note the only difference between Z-statistic and T-statistic is using the true variance σ vs. the sample variance s.

Hypothesis Testing Using Confidence Intervals

Confidence intervals use T-transformation to estimate population mean interval from sample mean and variance with 100(1 - α)% probability. This is essentially the same as the T-statistic:

Null hypothesis H₀: E(X) = μ₀, alternative two-tailed hypothesis H₁: E(X) ≠ μ₀.
H₀ will not be rejected at α level of significant if μ₀ falls within the 100(1 - α)% confidence interval.
H₀ will be rejected at α level of significant if μ₀ does not fall within the 100(1 - α)% confidence interval.
The use of either confidence intervals or acceptance regions should lead to the same conclusion when testing a two-tailed alternative.
- With an acceptance region, you start with a hypothesized value for µ0, and then determine whether sample values fall within that range if µ0 is correct.
- With a confidence interval, you first compute the sample mean, and then determine the estimated range for µ based on sample values. If µ0 falls within this range, you accept the null hypothesis.

ci_hypothesis

Pearson Chi-square Test Statistic

Z-statistic and T-Statistic are parametric statistics:

Assume some knowledge about the parent population (e.g. normal distribution or binomial distribution).
Measurements at interval scale.

Chi-square test concerns with categorical data:

Test Type	Null Hypothesis	Example
Goodness-of-Fit	The observed data follows the expected distribution	Roll a 6-sided die 60 times. H₀: the die is fair, each side has probability ¹⁄₆
Test of Independence	Two variables are independent	Survey people on gender and voting preference. H₀: gender and voting preference are not associated
Test of Homogeneity	The distributions of the categorical variable are the same across groups	Test whether 3 different cities have the same preference distribution for soda brands

The pearson chi-square statistic:

chisquare_statisitc

Note:

Oᵢ is the number of observations of type i.
Eᵢ = Npᵢ is the expected count of type i, under the null hypothesis that the fraction of type i in the population is pᵢ.
The smaller 𝒳²(n) the better.

Assumptions required for using the Pearson chi-square test:

Variables should be categorical. That is, variables take on values that are names or labels.
Data should be in frequency form.
Observations should be independent That is, the value of one observation in the dataset does not affect the value of any other observation.
Expected frequencies should be sufficiently large. It’s assumed that the expected value of cells in the contingency table should be 5 or greater in at least 80% of cells and that no cell should have an expected value less than 1.

Pearson Chi-square Statistic Follows Chi-square Distribution?

Transform the original statistic

𝒳²(n) = ∑((Oᵢ - Eᵢ)²/Eᵢ)    for 1 ≤ i ≤ n
      = ∑((Oᵢ - Eᵢ)/sqrt(Eᵢ))²

Based on chi-square’s property, if we can prove Zᵢ = (Oᵢ - Eᵢ)/sqrt(Eᵢ) ~ N(0, 1), then ∑Zᵢ² follows chi-square distribution.

Key idea: each observed count Oᵢ is a random variable that follows a binomial or multinomial
distribution under the hypothesis.

Oᵢ is essentially the number of successes in n trials with probability fo success pᵢ.
For a large enough n, the binomial variable Oᵢ is approximately:
    Oᵢ ~ N(Npᵢ, Npᵢqᵢ) = N(Eᵢ, Eᵢqᵢ).
So:
    (Oᵢ - Eᵢ) / sqrt(Eᵢqᵢ) = Zᵢ / sqrt(qᵢ) ~ N(0, 1)

qᵢ is not necessarily close to 1, how could Zᵢ approximates to standard normal distribution???
An extreme case is a 2 categorical data with q₀ = 0.99 and q₁ = 0.01.
Z₀ is close to standard normal distribution, but apparently Z₁ is NOT!

To answer the above question, it requires a quite lengthy proof that can be found here, a course material from MIT.

Fisher’s Exact Test

When samples are small, the distributions of 𝒳² (and other large-sample based statistics) are not well approximated by the chi-squared distribution. Their p-values are not to be trusted. In such situations, we can perform inference using an exact distribution (or estimates of exact distributions), but we should keep in mind that p-values based on exact tests can be conservative (i.e, measured to be larger than they really are).

Null hypothesis H₀: columns and rows are independent.

We may use an exact test if:

The row totals and the column totals are both fixed by design of the study.
Have a small sample size n.
More than 20% of cells have expected cell counts less than 5, and no expected cell count is less than 1. Note the expected cell count is not the value directly from the table. Instead, it equals to (row sum) * (col sum) / total.

Hypergeometric Distribution

The Fisher’s exact test can calculate probability using hypergeometric distribution. The table shown maps a problem in question to the corresponding hypergeometric statement, given fixed N, m, n.

	Successes	Failures	Total
Sampled	x = a	n-x = b	n
Not Sampled	m-x = c	N-m-n+x = d	N-n
Total	m	N-m	N

Hypergeometric probability gives chance of getting this exact table. With a simple transformation, we can get the probabilty of getting the exact table:

p(X = x) = ((a+b)! * (c+d)! * (a+c)! * (b+d)!) / (a! * b! * c! * d! * N!)

P-Value

The p-value is calculated as ∑p(X ≤ x) or ∑p(X ≥ x) depending on the definition of extreme. It represents the probability of observing the data if the null hypothesis were true. If the alternative hypothesis is really about the “Sampled” action causes more “Successes” (hence are associated), then it should use ∑p(X ≥ x). A small p-value means, if the null hypothesis were true, the observed data (or more extreme) is unlikely to have occurred by random chance. But the table data is observed. This leads to the rejection of the null hypothesis and suggests a statistically significant association between variables.

Examples

Example 1 (One-Sided Z-Statistic with One Sample)

One researcher believes a coin is “fair,” the other believes the coin is biased toward heads.
The coin is tossed 20 times, yielding 15 heads. Indicate whether or not the first researcher’s
position is supported by the results. Use α = .05.

Solution:
If the coin is fair, p = 0.5.
When n is large, we can use X ~ N(np, npq) = N(10, 5)
(1) Define hypotheses:
    H₀: E(X) = 10 heads
    H₁: E(X) > 10 heads
(2) Use Z = (number of heads ± 0.5 - 10) / sqrt(5/20) as the test statistic.
(3) Acceptance region is p(Z ≤ z) = 0.95 = 1 - α.
    So z = 1.65
    Reject H₀ if z > 1.65
(4) z = (15 - 0.5 - 10) / 0.5 = 2.01. H₀ is rejected.

Note: ± 0.5 in step 2 is a correction for continuity since X is not continuous.
To do this, add 0.5 to x when x < Np, and subtract 0.5 from x when x > Np.

Example 2 (Two-Sided Z-Statistic with One Sample)

Design a decision rule to test the hypothesis that a coin is fair if a sample of 64 tosses of
the coin is taken with a level of significance of 0.05.

Solution:
If the coin is fair, p = 0.5.
When n is large, we can use X ~ N(np, npq) = N(32, 16)
(1) Define hypotheses:
    H₀: E(X) = 32 heads
    H₁: E(X) ≠ 32 heads
(2) Use Z = (number of heads ± 0.5 - 32) / sqrt(16) as the test statistic.
(3) Acceptance region is p(-z ≤ Z ≤ z) = 0.95 = 1 - α.
    So z = 1.96
    Reject H₀ if z < 1.96 or z > 1.96

Note how the examples above use the Z-statistic without the sqrt(n) in denominator. When converting the original n Bernoulli trials into a binomial distribution, all tosses are combined to become one binomial observation of x successes. Therefore, sqrt(1) is ignored and the used formula is still a Z-statistic.

Example 3 (Z-Statistic with Multiple Samples)

A manufacture of steel rods considers that the manufacturing process is working properly if
the mean length of the rods is 8.6. The standard deviation of these rods always runs about
0.3 inches. Suppose a random sample of size n = 36 yields an average length of 8.7 inches.
Should the manufacturer conclude the process is working properly or improperly?

Solution:
Sample size n is large enough to consider X̄ as a normal distribution.
Since population μ and σ² are known, Z-statistic can be used.
(1) Define hypotheses:
    H₀: E(X) = μ₀ = 8.6
    H₁: E(X) ≠ 8.6
(2) Use Z = (X̄ - μ₀) / (σ / sqrt(n)) ~ N(0, 1) as the test statistic
(3) Acceptance region is p(-z ≤ Z ≤ z) = 0.95 = 1 - 2 * α.
    So z = 1.96
    Reject H₀ if z < 1.96 or z > 1.96
(4) z = (8.7 - 8.6) / (0.3 / sqrt(36)) = 2
    The null hypothesis H₀ should be rejected with level of significance 0.05.

Example 4 (T-Statistic)

A Bowler claims that she has a 215 average. In her latest performance, she scores 188,
214, and 204. Would you conclude the bowler is “off her game?”

Solution:
Sample size n = 3.
Population variance is unknown. So T-statisic can be used.
x̄ = (188 + 214 + 204) / 3 = 202
s² = ((188 - 202)² + (214 -202)² + (204 - 202)²) / (3 - 1)
   = 172
(1) Define hypotheses:
    H₀: E(X) = μ₀ = 215
    H₁: E(X) ≠ 215
(2) Use T = (X̄ - μ₀) / (s / sqrt(n)) ~ t-distribution with α and degree of freedoms 2.
(3) Critical region is p(-t ≤ T ≤ t) = 0.05 = α.
    So t = 4.303
    Reject H₀ if t < -4.303 or t > 4.303
(4) t = (202 - 215) / sqrt(172 / 3) = -1.717
    The null hypothesis H₀ cannot be rejected with level of significance 0.05.

Example 5 (Chi-square-Statistic for Goodness-of-Fit Test)

Below are the results of rolling a die 60 times. Do you consider the die fair?

    |  Face | Observed Frequency |
    | ----- | ------------------ |
    | 1     | 8                  |
    | 2     | 9                  |
    | 3     | 10                 |
    | 4     | 11                 |
    | 5     | 12                 |
    | 6     | 10                 |
    | Total | 60                 |

Solution:
H₀: the die is fair, each face has probability 1/6
H₁: the die is not fair (at least one face has a difference probability)

Under H₀ each face should occur 60 / 6 = 10 times, i.e. Eᵢ = 10

Compute chi-square statistic:
𝒳²(n) = ∑((Oᵢ - Eᵢ)²/Eᵢ)
      = (8 - 10)²/10 + (9 - 10)²/10 ... + (10 - 10)²/10
      = 1.0

Degrees of freedom (df) = number of categories - 1
                        = 6 - 1
                        = 5

For 𝛼 = 0.05 and df = 5, 𝑥² = 11.070
Since 𝒳²(n) ≪ 𝑥², H₀ can NOT be rejected.
We consider the die is fair.

Example 6 (Chi-square-Statistic for Test of Independence)

Here is a survey of 100 people about their gender and voting preference (Party A or Party B):

    |              | Party A | Party B | Row Total |
    | ------------ | ------- | ------- | --------- |
    | Male         | 20      | 30      | 50        |
    | Female       | 30      | 20      | 50        |
    | Column Total | 50      | 50      | 100       |

Do you consider gender and voting preference are independent?

Solution:
H₀: gender and voting preference are independent.
H₁: gender and voting are associated.

Compute expected counts for each table cell:
    Eᵢⱼ = (Row Totalᵢ) * (Column Totalⱼ) / Total

    |        |    Party A   |    Party B   |
    | ------ | ------------ | ------------ |
    | Male   | 50*50/100=25 | 50*50/100=25 |
    | Female | 50*50/100=25 | 50*50/100=25 |

Compute chi-square statistic:
𝒳² = ∑((Oᵢⱼ - Eᵢⱼ)²/Eᵢⱼ)
   = (20 - 25)²/25 + (30 - 25)²/25 ... + (20 - 25)²/25
   = 4.0

Degrees of freedom (df) = (rows - 1) * (columns - 1) = 1

For 𝛼 = 0.05 and df = 1, 𝑥² = 3.841
Since 𝒳²(n) > 𝑥², we can reject H₀ and consider gender and voting are NOT independent.

Example 7 (Chi-square-Statistic for Test of Homogeneity)

Here is a survey of voters in 3 cities about their preferred political party: A, B, or C.

    | City          | Party A | Party B | Party C | Row Total |
    | ------------- | ------- | ------- | ------- | --------- |
    | City 1        | 30      | 10      | 10      | 50        |
    | City 2        | 20      | 20      | 10      | 50        |
    | City 3        | 10      | 30      | 10      | 50        |
    | Column Totals | 60      | 60      | 30      | 150       |

Do you consider these 3 cities have the same party preference distribution?

Solution:
H₀: all cities have the same party preference distribution.
H₁: at least one city's party preference distribution is different.

Compute expected counts for each table cell:
    Eᵢⱼ = (Row Totalᵢ) * (Column Totalⱼ) / Total

    | City   |    Party A   |    Party B   |    Party C   |
    | ------ | ------------ | ------------ | ------------ |
    | City 1 | 50*60/150=20 | 50*60/150=20 | 50*30/150=10 |
    | City 2 | 50*60/150=20 | 50*60/150=20 | 50*30/150=10 |
    | City 3 | 50*60/150=20 | 50*60/150=20 | 50*30/150=10 |

Compute chi-square statistic:
𝒳² = ∑((Oᵢⱼ - Eᵢⱼ)²/Eᵢⱼ)
   = (30 - 20)²/20 + (10 - 20)²/20 + (10 - 10)²/10... + (10 - 10)²/25
   = 20.0

Degrees of freedom (df) = (rows - 1) * (columns - 1) = 4

For 𝛼 = 0.05 and df = 4, 𝑥² = 9.488
Since 𝒳²(n) > 𝑥², we can reject H₀ and consider not all cities share the same distribution
of party preference.

Example 8 (Fisher’s Exact Test)

Here is a clinical trial tests a vaccine on a small group.

    |         | Infected | Not Infected | Total |
    | ------- | -------- | ------------ | ----- |
    | Vaccine | 1        | 9            | 10    |
    | Placebo | 8        | 2            | 10    |
    | Total   | 9        | 11           | 20    |

Do you consider infection status is associated with receiving the vaccine?

Solution:
H₀: Vaccination and infection are independent (vaccine does not reduce infection).
H₁: Vaccination and infection are associated.

p-value = p(X = 1) + p(X = 0)
        = 0.0268 + 0.0006
        = 0.0275
Since p-value < 0.05, the chance of observing extreme cases should be rare.
But we observed it.
H₀ is rejected, i.e. vaccine appears effective in reducing infection.