Estimation

Statistical inference can be divided indo two parts: 1) estimation and 2) hypothesis testing. Estimation estimates values of specific population parameters; hypothesis testing tests whether the value of a population parameter is equal to a specified value. In order to make statistical inference about population parameters, samples must be taken. A random sample is a selection of some members of a population of interest such that each member is independently chosen and has a known non-zero probability of being selected. A simple random sample is a random sample in which each group member has the same probability of being selected. The study population is the group of interest to be studied, from which the random sample is taken.

Estimation of the Mean of a Distribution

Suppose we have a random sample \(x_1, x_2, \dots, x_n\) which we will use to estimate the population parameters \(\mu\) (mean) and \(\sigma^2\) (variance) of the underlying distribution. An estimator of a parameter, for example \(\theta\) is denoted as \(\hat{\theta}\); the estimator is unbiased if the expectation of \(\hat{\theta}\) is equal to \(\theta\) (\(E(\hat{\theta}) = \theta\)). This means that the average value of \(\hat{\theta}\) over a large number of \(n\) repeated samples is \(\theta\), the true population parameter. We have already been working with an unbiased estimator, \(\bar{X}\), where

\[\bar{X} = \sum_{i=1}^n X_i/n\]

When the underlying distribution of the population is normal, it can be shown that the unbiased estimator of \(\mu\) with the smallest variance is \(\bar{X}\). The sampling distribution of \(\bar{X}\) is the distribution of values of \(\bar{x}\) over all possible samples of size \(n\) that could have been drawn from the study population.

The standard error of the mean (SEM) or standard error is the estimated standard deviation from a set of sample means from repeated samples from the study population of size \(n\) with a population with underlying variance \(\sigma^2\). Formally, let \(X_1, X_2, \dots, X_n\) be a random sample from a population with underlying mean \(\mu\) and variance \(\sigma^2\), Then the set of sample means from repeated random samples of size \(n\) from this population will have variance \(\sigma^2/n\), where the standard deviation of this set of sample means is:

\[SE(\bar{X})=\sigma/\sqrt{n}\]

Note that the standard error is not the standard deviation of an individual observation, but instead the standard deviation of the sample mean \(\bar{X}\).

A sample proportion is a mean of independent Bernoulli (0 or 1) trials. If the population proportion, \(\pi\), is known, then the standard error of the proportion is:

\[SE(\pi) = \sqrt{\frac{\pi(1-\pi)}{n}}\]

We can estimate the population proportion (\(\pi\)) by the sample proportion \(p\), giving:

\[SE(p) = \sqrt{\frac{p(1-p)}{n}}\]

Central Limit Theorem

If an underlying distribution is normal, then the sample mean is normally distributed with mean \(\mu\) and variance \(\sigma^2/n\) (\(\bar{X} \sim N(\mu, \sigma^2/n)\)). If an underlying distribution is not normally distributed, the Central Limit Theorem (CLT) may be applied in certain situations to enable use to perform statistical inference based on the approximate normality of the sample mean. Let \(X_1, \dots, X_n\) be a random sample from a study population with mean \(\mu\) and variance \(\sigma^2\). Then as \(n \rightarrow \infty\), \(\bar{X} \sim N(\mu, \sigma^2/n)\) even if the underlying distribution of individual observations in the population is not normally distributed.

Example 1

Suppose you have a series of basal body temperature measurements from a participant: 97.2,97.8, 98.1, 96.9, 97.4, 98.3, 97.7, 97.1, 96.8, 97.3. What is the best estimate of this individual’s body temperature (\(\mu\)) and how precise is this estimate?

We start by creating a vector called temps in R by combining the 10 measurements using the concatenate (c()) function:

temps <- c(97.2,97.8, 98.1, 96.9, 97.4, 98.3, 97.7, 97.1, 96.8, 97.3)
#how many observations?
length(temps)
## [1] 10

The best estimate of this individual’s body temperature will be the sample mean, \(\bar{x}\) which we calculate using the mean() function:

mean(temps)
## [1] 97.46

The mean temperature is 97.46 degrees Fahrenheit. The precision of this estimate is given to us by its standard error, which is \(s/\sqrt{10}\), where \(s\) is the standard deviation. We first calculate the standard deviation using the sd() function and divide it by 10, which can get programatically with length(temps):

sd(temps)/length(temps)
## [1] 0.05015531

If we wanted to, we could now calculate a 95% confidence interval:

xbar <- mean(temps)
se <- sd(temps)/length(temps)
lb <- xbar - 1.96*se
ub <- xbar + 1.96*se

cbind(lb, ub)
##           lb      ub
## [1,] 97.3617 97.5583

The 95% confidence interval for the mean is (97.3616955839129, 97.5583044160871).

Example 2

Suppose you have a study with 22000 participants - 3200 of these participants self-report as ever-smokers. Calculate the sample proportion and standard error of the proportion.

First, we calculate the sample proportion of ever-smokers and store the result into the object prop_smokers:

(prop_smokers <- 3200/22000)
## [1] 0.1454545

Sixteen percent of respondents report being ever-smokers. Next, we calculate the standard error of the proportion:

(se_prop_smokers <- sqrt((prop_smokers*(1-prop_smokers))/22000))
## [1] 0.002376949

The standard error is 0. If we assume a normal sampling distribution due to the large sample size of the study, we can calculate a 95% confidence interval:

prop_lb <- prop_smokers - 1.96*se_prop_smokers
prop_ub <- prop_smokers + 1.96*se_prop_smokers

cbind(prop_lb, prop_ub)
##        prop_lb   prop_ub
## [1,] 0.1407957 0.1501134

The 95% confidence interval by normal approximation for the proportion is (0.140795725103603, 0.150113365805488).

Estimation of the Variance of a Distribution

The sample variance is defined as:

\[s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i -\bar{x})^2\]

The sample variance, \(S^2\) is an unbiased estimator of \(\sigma^2\) over all possible random samples of size \(n\) possibly drawn from a study population. In other words, \(E(S^2) = \sigma^2\).

Hypothesis Testing

The null hypothesis (\(H_0\)) is the hypothesis to be tested. The alternative hypothesis (\(H_A\)) is the hypothesis that contradicts the null. With hypothesis testing, you can reject or fail to reject \(H_0\). If we were to know ground truth, there are four possible outcomes that can occur:

  1. Fail to reject \(H_0\) when \(H_0\) is really true.
  2. Fail to reject \(H_0\) when \(H_A\) is really true.
  3. Reject \(H_0\) when \(H_0\) is really true.
  4. Reject \(H_0\) when \(H_A\) is really true.
Truth Truth
\(H_0\) \(H_A\)
Decision Fail to reject \(H_0\) \(H_0\) is true and \(H_0\) is not rejected \(H_A\) is true and \(H_0\) is not rejected
Decision Reject \(H_0\) \(H_0\) is true and \(H_0\) is rejected \(H_A\) is true and \(H_0\) is rejected

It is important to note that the null hypothesis cannot be accepted - we only have evidence to reject or fail to reject the null hypothesis.

Note the discordant pairs in the table above: \(H_0\) is true and \(H_0\) is rejected and \(H_A\) is true and \(H_0\) is not rejected. In these cases, an error has been made. There are two types of errors that can be made:

  1. Type I Error: The probability of rejecting the null hypothesis (\(H_0\)) when \(H_0\) is really true.
  2. Type II Error: The probability of failing to reject the null hypothesis when \(H_A\) is true.

The probability of a Type I error is denoted by \(\alpha\) and is commonly referred to as the significance level of a test.

The probability of a Type II error is denoted as \(\beta\). The power of a test is \(1-\beta=1 - \text{probability of a type II error} = Pr(\text{rejecting } H_0 | H_A \text{ is true})\).

The acceptance region or non-rejection region is the range of values of \(\bar{x}\) for which \(H_0\) is not rejected. Similarly, the rejection region is the range of \(\bar{x}\) values for which \(H_0\) is rejected.

One-Tailed Tests

A one-tailed test is a test in which the values of the parameter being studies (for example, \(\mu\)) under the alternative hypothesis are allowed to either be greater than or less than the values of the parameter under the null hypothesis (for example, \(\mu_0\)), but not both (this would be a two-tailed test).

We can test the following hypothesis:

\[H_0: \mu = \mu_0\] \[H_A: \mu < \mu_0\]

with \(\sigma\) unknown and significance level \(\alpha\). We first calculate a test statistic:

\[t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\]

  • if the test statistic \(t\) (above) is less than the critical value \(t_{n-1,\alpha}\), then we reject \(H_0\).
  • if the test statistic \(t\) is \(\geq t_{n-1,\alpha}\), then we fail to reject \(H_0\). Figure 1 visualizes the critical value approach to hypothesis testing:
Visualization of hypothesis testing approach for one-tailed test (lower tail).

Visualization of hypothesis testing approach for one-tailed test (lower tail).

Example

Consider the following example. From a single hospital, you have a random sample of 200 consecutive births, with the mean birthweight (\(\bar{x}\)) is 112 oz and the sample standard deviation (\(s\)) is 20 oz. From a nationwide study of millions of births, it is known that the mean birthweight in the united states is 120 oz.

Using a one-sample \(t\)-test, test the hypothesis that \(H_0: \mu = 112\) vs. \(H_A: \mu < 112\)

First, we calculate the test statistics:

(tstat <- (112 - 120)/(20/sqrt(200)))
## [1] -5.656854

To find the critical value, the degrees of freedom for the t-distribution will be \(n-1\) or \(200-1=199\). We use the function qt() or quantile function with arguments p=0.05 (the significance level) and df=199 (the degrees of freedom):

(critval <- qt(p=0.05, df=199))
## [1] -1.652547

The test statistic (-5.6568542) is much smaller than the critical value (-1.6525467). Thus, we can reject \(H_0\) in favor of \(H_A\) at the 0.05 significance level. The birth weights at this particular hospital are lower than the mean population birth weight. Figure 2 visualizes the components to the critical value testing approach to hypothesis testing:

Visualization of hypothesis testing approach to one-tailed test for the birth weight example.

Visualization of hypothesis testing approach to one-tailed test for the birth weight example.

If instead of having just the summary statistics of the mean and standard deviation we had the actual data (as a vector from a data frame), we could use the t.test() function. To demonstrate the function, I will first generate 200 samples from a normal distribution with mean 112 and standard deviation 20:

set.seed(123) #set seed so you get the same random draws every time
samples <- rnorm(n = 200, mean = 112, sd=20) #generate 200 samples from our specific distribution
length(samples) #check the number of samples
## [1] 200
head(samples) #check out the first 6 observations
## [1] 100.7905 107.3965 143.1742 113.4102 114.5858 146.3013
mean(samples) #check mean of samples
## [1] 111.8286
sd(samples) # check sd of samples
## [1] 18.8632
hist(samples, freq = FALSE) #make a histogram of these samples

#finally, use t.test()

t.test(samples, mu=120, alternative = "less",conf.level = 0.05, paired=FALSE)
## 
##  One Sample t-test
## 
## data:  samples
## t = -6.1263, df = 199, p-value = 2.365e-09
## alternative hypothesis: true mean is less than 120
## 5 percent confidence interval:
##      -Inf 109.6244
## sample estimates:
## mean of x 
##  111.8286

The result here differs slightly from our calculations (t-statistic -6.1263 vs -5.6568542). However this is due to the variation of the random samples drawn from the distribution with mean 112 and sd 20. From above, we can see that though the observed sample mean of the samples is near 120 (111.8285911) and the standard deviation of the samples is near 20 (18.8631976), there is some slight variation. The final conclusion of rejecting \(H_0\), however, is consistent.

From the test above, we also have a reported p-value. The p-value is the probability of obtaining a test statistic as extreme or more extreme than the observed test statistic. In the above results, the p-value is $2.365 ^{-9} $, very very small! There is very strong evidence against the null hypothesis.

Guidance on interpreting p-values and the strength of evidence against the null hypothesis can be summarized in the table below:

P-Value Evidence Against \(H_0\)
> 0.1 No evidence
(0.05, 0.1] Weak evidence
(0.01, 0.05] Moderate evidence
(0.001, 0.01] Strong evidence
< 0.001 Very strong evidence

Two-Tailed Tests

With a two-tailed test, the values of the parameter being studied (for example, \(\mu\)) under the alternative hypothesis are allowed to be either greater than or less than the values of the parameter under the null hypothesis (for example, \(\mu_0\)). In other words, the alternative is that the values of the parameter are not equal to that of the values under the null.

We can test the following two-tailed hypothesis:

\[H_0: \mu = \mu_0\] \[H_A: \mu \neq \mu_0\]

with significance level \(\alpha\). We calculate the test statistic:

\[t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\]

  • if the test statistic \(|t|> t_{n-1, 1-\alpha/2}\), then \(H_0\) is rejected.
  • if \(|t| \leq t_{n-1, 1-\alpha/2}\), then we fail to reject \(H_0\).
Visualization of hypothesis testing approach for two-tailed test.

Visualization of hypothesis testing approach for two-tailed test.

Example 1

Consider the following example: We would like to test the hypothesis that the mean cholesterol level of female immigrants from Country A are different from the mean cholesterol of the U.S. population. Suppose that you are given that the mean cholesterol level for women in the United States is 190 mg/dL. You conduct a study where from a random sample of women from Country A, you take blood samples and find the mean level in this group to be 180.23 mg/DL with a standard deviation of 40 mg/DL.

For a two-sided one-sample test, we first calculate the test statistic:

\[t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}\]

or

\[t = \frac{180.23 - 190}{40/\sqrt{100}} = -2.4425\] We can do this in R:

(tstat <- (180.23 - 190)/(40/sqrt(100)))
## [1] -2.4425

To find the critical value, the degrees of freedom for the t-distribution will be \(n-1\) or \(100-1=99\). We use the function qt() or quantile function with arguments p=0.05 (the significance level) and df=99 (the degrees of freedom). The two-sided critical values are \(c_1 = t_{99, 0.025}\) and \(c_2 = t_{99, 0.975}\)

(critval1 <- qt(p=0.025, df=99))
## [1] -1.984217
(critval2 <- qt(p=0.975, df=99))
## [1] 1.984217

The test statistic (-2.4425) is much smaller than the critical value (-1.984217). Thus, we can reject \(H_0\) in favor of \(H_A\) at the 0.05 significance level. The mean cholesterol of women from Country A is lower than the mean cholesterol for U.S. women. Figure 3 visualizes the components to the critical value testing approach to hypothesis testing:

Visualization of hypothesis testing approach for two-tailed test for mean cholesterol in the U.S. example.

Visualization of hypothesis testing approach for two-tailed test for mean cholesterol in the U.S. example.

If instead of having just the summary statistics of the mean and standard deviation we had the actual data (as a vector from a data frame), we could use the t.test() function. To demonstrate the function, I will first generate 100 samples from a normal distribution with mean 180.23 and standard deviation 40:

set.seed(123) #set seed so you get the same random draws every time
samples2 <- rnorm(n = 100, mean = 180.23, sd=40) #generate 100 samples from our specific distribution
length(samples2) #check the number of samples
## [1] 100
head(samples2) #check out the first 6 observations
## [1] 157.8110 171.0229 242.5783 183.0503 185.4015 248.8326
mean(samples2) #check mean of samples
## [1] 183.8462
sd(samples2) # check sd of samples
## [1] 36.51264
hist(samples2, freq = FALSE) #make a histogram of these samples

#finally, use t.test()

t.test(samples2, mu=190, alternative = "two.sided",conf.level = 0.05, paired=FALSE)
## 
##  One Sample t-test
## 
## data:  samples2
## t = -1.6854, df = 99, p-value = 0.09507
## alternative hypothesis: true mean is not equal to 190
## 5 percent confidence interval:
##  183.6167 184.0758
## sample estimates:
## mean of x 
##  183.8462

Two-Sample Tests

When you have underlying parameters of two different populations (neither of which values are known) being compared, you have a two-sample hypothesis-testing problem.

Equal Variance

We assume there are two normally-distributed samples where the first sample is a random sample of size \(n_1\) from a \(N(\mu_1, \sigma^2_1)\) distribution and the second sample is a random sample of size \(n_2\) from a \(N(\mu_2, \sigma^2_2)\) distribution, and \(\sigma^2_1 = \sigma^2_2 = \sigma^2_2\).

Suppose we conduct a study where you are interested in comparing systolic blood pressure (SBP) in pre-menopausal oral contraceptive (OC) users and non-users. We can formulate the hypothesis as:

\[H_0: \mu_1 = \mu_2\] \[H_A: \mu_1 \neq \mu_2\]

If we assume that the variance of the users and non-users groups are the same (\(\sigma^2_1 = \sigma^2_2 = \sigma^2\)), then we have a two sample test for independent samples with equal variances. The test statistic is given by:

\[t = \frac{\bar{x}_1 - \bar{x}_2}{s \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}},\]

where \[s = \sqrt{\frac{(n_1 -1)s_1^2 + (n_2 -1)s_2^2}{n_1 +n_2 -2}}\]

The degrees of freedom will be \(n_1 + n_2 -2\). \(H_0\) will be rejected if \(t > t_{n_1 + n_2 -2, 1-\alpha/2}\) or when \(t < t_{n_1 +n_2 -2, \alpha/2}\).

Example 1

Let’s continue with the OC users vs. non-users example. Let the mean SBP of a sample of 10 OC users be 132.03 mm Hg (standard deviation of 13.42) and the mean SBP of a sample of 19 non-OC users be 127.93 mm Hg (standard deviation of 15.32). Are these two groups different?

#set parameters
n1 <- 10 #OC users
n2 <- 19 #non-OC users
SBP1 <- 132.04 #mean SBP of OC users
SBP2 <- 127.93 #mean SBP of non-OC users
sd1 <- 13.42 #sd of OC users
sd2 <- 15.32 #sd of non-OC users

#calculate critical values 
(degfreedom <- n1 + n2 -2)
## [1] 27
(critval1 <- qt(p=0.025, df=degfreedom))
## [1] -2.051831
(critval2 <- qt(p=0.975, df=degfreedom))
## [1] 2.051831
#calculate square root of  pooled estimate of variance:
(s <- sqrt(((n1 -1)*sd1^2 + (n1-1)*sd2^2)/(n1 +n2-2)))
## [1] 11.75867
#calculate test statistic
(tstat <- (SBP1 - SBP2)/(s*sqrt(1/n1 + 1/n2)))
## [1] 0.8946672

The test statistic 0.8946672 is greater than \(t_{n_1 +n_2 -2, \alpha/2}\) (-2.0518305) and less than \(t_{n_1 + n_2 -2, 1-\alpha/2}\)(2.0518305). We fail to reject \(H_0\) at \(\alpha=0.05\).

We calculate the p-value by 2 times taking the distribution at the test statistic value (\(2 \times P(t_{n_1+n_2-2} > t) = 2 \times 1- P(t_{n_1+n_2-2} \leq t)\)):

2*(1-pt(tstat, df=27))
## [1] 0.3788716

There is no evidence against \(H_0\).

Example 2

Let us go back to the Melanoma data set from the R package MASS, introduced in the Descriptive Statistics notes.

library(MASS)
data("Melanoma")

Recall, the Melanoma data set contains data collected on 205 patients in Denmark on malignant melanoma. The variables are defined as below:

  • time: survival time in days (possibly censored)
  • status: 1=died from melanoma; 2=alive; 3=dead from other causes
  • sex: 1=male; 0=female
  • age: age in years
  • year: year of operation
  • thickness: tumor thickness in \(mm\)
  • ulcer: 1=presence; 0=absence

Suppose we want to use a two-sided two-sample t-test whether thickness is the same across males and females (sex). We use the same function t.test() and add the argument var.equal=TRUE. We take the response thickness and use the ~ operator to specify that the two samples are sex.

t.test(thickness~sex, data=Melanoma, var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  thickness by sex
## t = -2.6883, df = 203, p-value = 0.007777
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.9496125 -0.2998089
## sample estimates:
## mean in group 0 mean in group 1 
##        2.486429        3.611139

From the results above, we see that the there is moderate evidence to reject \(H_0\) at the \(\alpha=0.05\) level. There is a difference in thickness between males and females.

Unequal Variance

If you have two samples where the two variances are not equal, then you can use a two-sample test for unequal variances. This is also known as Satterthwaite’s Method. We assume there are two normally-distributed samples where the first sample is a random sample of size \(n_1\) from a \(N(\mu_1, \sigma^2_1)\) distribution and the second sample is a random sample of size \(n_2\) from a \(N(\mu_2, \sigma^2_2)\) distribution, and \(\sigma^2_1 \neq \sigma^2_2\).

The hypothesis is still:

\[H_0: \mu_1 = \mu_2\] \[H_A: \mu_1 \neq \mu_2\]

The test statistic is given by:

\[t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

The approximate degrees of freedom (\(d'\)) are given by: \[d' = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{(s_1^2/n_1)^2/(n_1 -1) + (s_2^2/n_2)^2/(n_2-1)}\]

The same rejection rules apply as before.

Example 1

Consider a study of cholesterol. Suppose that you have conducted a study of cholesterol levels from 100 children (aged 2-14 years old) whose biological fathers had died of heart disease (group 1). You also collect a second sample of cholesterol levels from 90 children whose biological fathers had no history or diagnosis of heart disease and were still living as controls (group 2). In the first group, the mean cholesterol level is 210.3 mg/dL (standard deviation 32.9 mg/DL). In the second group, the mean cholesterol level is 190.1 mg/DL (standard deviation 22.3 mg/DL). Perform a two-sample \(t\) test for unequal variances.

#set parameters
n1 <- 100 #OC users
n2 <- 90 #non-OC users
chol1 <- 210.3 #mean cholesterol group 1
chol2 <- 190.1 #mean cholesterol group 2
sd1 <- 32.9 #sd of group 1
sd2 <- 22.3 #sd of group 2

#calculate approximate degrees of freedom using Sattherwaite's method
(degfreedom_sat <- (sd1^2/n1 + sd2^2/n2)^2/((sd1^2/n1)^2/(n1-1) +(sd2^2/n2)^2/(n2-1)))
## [1] 175.1131
#calculate critical values 
(critval1 <- qt(p=0.025, df=degfreedom_sat))
## [1] -1.973604
(critval2 <- qt(p=0.975, df=degfreedom_sat))
## [1] 1.973604
#calculate test statistic
(tstat <- (chol1 - chol2)/sqrt(sd1^2/n1 + sd2^2/n2))
## [1] 4.995725

The test statistic 4.9957252 is greater than (-1.9736036). We reject \(H_0\) at \(\alpha=0.05\).

We calculate the p-value by 2 times taking the distribution at the test statistic value (\(2 \times P(t_{n_1+n_2-2} > t) = 2 \times 1- P(t_{n_1+n_2-2} \leq t)\)):

2*(1-pt(tstat, df=degfreedom_sat))
## [1] 1.410882e-06

The p-value is 1.4108816^{-6}. There is very strong evidence against \(H_0\).

Example 2

Suppose we want to use a two-sided two-sample t-test whether thickness is the same across males and females (sex). However this time, we want to use a two-sided \(t\) test for unequal variance. We use the same function t.test(), but add in the additional argument var.equal=FALSE (the default).

t.test(thickness~sex, data=Melanoma, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  thickness by sex
## t = -2.6059, df = 149.09, p-value = 0.01009
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.9775560 -0.2718653
## sample estimates:
## mean in group 0 mean in group 1 
##        2.486429        3.611139

From the results above, we see that the there is moderate evidence to reject \(H_0\) at the \(\alpha=0.05\) level. There is a difference in thickness between males and females.

Equal vs. Unequal Variance Test

For the two-sided \(t\) test with equal variances, we assume a common variance \(\sigma^2\) and take a weighted average of the individual sample variances. To determine if this is a reasonable approach and to determine whether or not to use the equal vs. unequal variance \(t\) test, we can develop a hypothesis. Namely, that:

\[H_0: \sigma^2_1 = \sigma^2_2\]

\[H_A: \sigma^2_1 \neq \sigma^2_2\]

where the two samples are assumed to be independent random samples from \(N(\mu_1, \sigma^2_1)\) and from \(N(\mu_2, \sigma^2_2)\). We use the sample variances \(s_1^2\) and \(s^2_2\), which are the estimators of \(\sigma^2_1\) and \(\sigma^2_2\). We want to use the relative magnitudes of the sample variances or their ratio, \(s^2_1/s_2^2\), where \(H_0\) would be rejected if the ratio is too large or too small. A rule of thumb, is that if the ratio \(s^2_1/s_2^2\) > 2 or \(s^2_1/s_2^2\) < 1/2, then the variances are not equal and you would reject \(H_0\). Formally, we can use the F test for the equality of two variances. The test statistic is:

\[F = s^2_1/s_2^2\]

If \(F< F_{n_1-1,n_2-1, \alpha/2}\) or \(F > F_{n_1-1,n_2-1, 1-\alpha/2}\), then \(H_0\) is rejected. If \(F_{n_1-1,n_2-1, \alpha/2} \leq F \leq F_{n_1-1,n_2-1, 1-\alpha/2}\), then we fail to reject \(H_0\). \(F\) here is the F-distribution.

Example 1

In the above examples, we used both the equal and unequal variances \(t\) test to test if there was a differences in thickness between females and males. We will use the F-test for the equality of two variances to identify which test we should use.

table(Melanoma$sex)
## 
##   0   1 
## 126  79
nfemale <- 126 #number of females (from table above)
nmale <- 79 #number of males (from table above)
(svars <- tapply(Melanoma$thickness, Melanoma$sex, var)) #calculate sample variance for males and females
##        0        1 
## 7.587911 9.958592
(svars_male <- svars[1])
##        0 
## 7.587911
(svars_female <- svars[2])
##        1 
## 9.958592
#calculate fstat
(fstat <- svars_male/svars_female)
##         0 
## 0.7619462
#calculate critical values
(critval1 <- qf(p=0.025, df1=nfemale, df2=nmale))
## [1] 0.6764302
(critval2 <- qf(p=0.975, df1=nfemale, df2=nmale))
## [1] 1.506064

The F-statistic 0.7619462 is greater than critical value 1 (0.6764302) and less than critical value 2 (1.506064). We therefore fail to reject the \(H_0\). There is no evidence that the two variances are not equal to each other and we can proceed with the test for equal variances.

We can also use the function var.test() to more easily perform the same F-test for equal variances:

var.test(thickness ~ sex, data=Melanoma)
## 
##  F test to compare two variances
## 
## data:  thickness by sex
## F = 0.76195, num df = 125, denom df = 78, p-value = 0.1748
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.5047001 1.1287015
## sample estimates:
## ratio of variances 
##          0.7619462

Notice that we obtain the same F-statistic and reach the same conclusions.

References

Additional Resources

Copyright (c) 2021 Maria Kamenetsky