What are the factors to consider in determining the appropriate sample size for a study?

Detecting a change or difference is often the aim of an experiment or set of measurements. We want to learn which vendor, process, or design provides a better result.

When we use a sample to estimate a statistic for a population we take the risk that the sample provides values that are not representative of the population. For example, if we use a professional basketball team to sample men’s height. We may conclude that the height of men in the general population is taller than the true population value.

Unlike the basketball team sample, we take care to draw a random and hopefully representative sample. Due to the nature of chance alone, the sample may not provide accurate results.

Confidence and Significance

Statisticians use terms like ‘confidence’ and ‘significance’ with specific meanings related to sampling risks. These terms are related and provide the result’s ability to be convincing. Low significance or confidence implies the results based on the sample data is not convincing.

Confidence is the idea of being certain that the estimate based on the sample is correctly representing the population. This is often used in relation to an interval or bound within which the true and unknown population value is expected to reside.

This is often used in relation to an interval or bound within which the true and unknown population value is expected to reside.

Significance is the idea that the results are not due to random chance alone. It is the notion that there is convincing evidence based on the sample data that there really is a difference. We commonly use this when accepting the alternative hypothesis with a specified level of statistical significance.

Selecting a meaningful sample size

The risks around using a sample to make conclusions about a population are only one of three considerations when determining the sample size for an experiment. The sampling risk, the population’s variance, and the precision or amount of change we wish to detect all impact the calculation of sample size.

The less risk we want to take related to the sample representing the population the more samples required. If you want to have no sampling risk measure every unit in the population using a method that has very little measurement error.

Consider variance

The higher the underlying population’s variation or spread the more samples we will need to determine the same result over a smaller variation population. The chance of selecting samples further from the mean value (if we are attempting to estimate a population’s mean for example) it will take more samples to get an accurate estimate than is the population has a very tight variance.

The more precise or smaller the difference that we want to detect, the more samples are required. Given the same risk and variance, the ability of a sample to detect a 1-millimeter height difference in two groups versus a 1-meter difference will require many more samples.

The following sample size formula for an estimate of a normal distribution mean to provide the relationship of these three elements.

$- n=\frac{{{Z}^{2}}{{\sigma }^{2}}}{{{E}^{2}}} -$

Z represents the Type I (the 1-α) risk. Other versions of sample size formulas include both Type I and Type II risks and the appropriate distribution for the specific measure. As the desired risk goes down, the Z value from the normal table goes up, thus the sample size increases.

The risk is a business decision based on what risk the decision-maker wishes to take often becoming a trade-off between the cost of the samples and experiment versus the possibility the sample does not represent the population and we make the wrong decision.

σ is the standard deviation of the population (we often use an estimate of the variation from the sample and then use the Student t table instead of the normal table). The variance is standard deviation squared. As the variance increases the sample size increases.

The population variance is what it is. By working to reduce the variance by reducing process or measurement variation we may both improve the product or process, and reduce the samples needed for sample sizes.

E is the difference of interest. Some others use delta, Δ, to represent this term as it is the amount of separation or precision desired to detect that drives the sample size. The smaller this value we will need more samples.

E for this normal distribution means the estimate is the difference of means, μ1 –  μ. This amount that is worth knowing to make a decision.

This becomes a tradeoff between the cost of the samples and experiment and the amount of difference that is important. At what amount of difference will we make a change is a good way to consider how to set this value.

Summary

Most sample size formulas contain these three elements. Two parts, risk and precision, are business or technical decisions. The variance is part of the underlying data and not and option to increase or decrease easily.

The sample size formula is useful for the discussion around risk and decision points prior to conducting the experiment. It is one way to design and conduct measurements that provide value. If our results are to useful than considering all three elements that make up a sample size formula become important.

Related:

Sample size (article)

Sample Size – success testing (article)

Statistical Confidence (article)

What are the factors to consider in determining the appropriate sample size for a study?

Sample size is a research term used for defining the number of individuals included in a research study to represent a population. The sample size references the total number of respondents included in a study, and the number is often broken down into sub-groups by demographics such as age, gender, and location so that the total sample achieves represents the entire population. Determining the appropriate sample size is one of the most important factors in statistical analysis. If the sample size is too small, it will not yield valid results or adequately represent the realities of the population being studied. On the other hand, while larger sample sizes yield smaller margins of error and are more representative, a sample size that is too large may significantly increase the cost and time taken to conduct the research.

This article will discuss considerations to put in place when determining your sample size and how to calculate the sample size.

Confidence Interval and Confidence Level

As we have noted before, when selecting a sample there are multiple factors that can impact the reliability and validity of results, including sampling and non-sampling errors. When thinking about sample size, the two measures of error that are almost always synonymous with sample sizes are the confidence interval and the confidence level.

Confidence Interval (Margin of Error)

Confidence intervals measure the degree of uncertainty or certainty in a sampling method and how much uncertainty there is with any particular statistic. In simple terms, the confidence interval tells you how confident you can be that the results from a study reflect what you would expect to find if it were possible to survey the entire population being studied. The confidence interval is usually a plus or minus (±) figure. For example, if your confidence interval is 6 and 60% percent of your sample picks an answer, you can be confident that if you had asked the entire population, between 54% (60-6) and 66% (60+6) would have picked that answer.

SUGGESTED  The Role of Culture and Language Adaptation in Surveys

Confidence Level

The confidence level refers to the percentage of probability, or certainty that the confidence interval would contain the true population parameter when you draw a random sample many times. It is expressed as a percentage and represents how often the percentage of the population who would pick an answer lies within the confidence interval. For example, a 99% confidence level means that should you repeat an experiment or survey over and over again, 99 percent of the time, your results will match the results you get from a population.

The larger your sample size, the more confident you can be that their answers truly reflect the population. In other words, the larger your sample for a given confidence level, the smaller your confidence interval.

Standard Deviation

Another critical measure when determining the sample size is the standard deviation, which measures a data set’s distribution from its mean. In calculating the sample size, the standard deviation is useful in estimating how much the responses you receive will vary from each other and from the mean number, and the standard deviation of a sample can be used to approximate the standard deviation of a population.

The higher the distribution or variability, the greater the standard deviation and the greater the magnitude of the deviation. For example, once you have already sent out your survey, how much variance do you expect in your responses? That variation in responses is the standard deviation.

Population Size

What are the factors to consider in determining the appropriate sample size for a study?
The other important consideration to make when determining your sample size is the size of the entire population you want to study. A population is the entire group that you want to draw conclusions about. It is from the population that a sample is selected, using probability or non-probability samples. The population size may be known (such as the total number of employees in a company), or unknown (such as the number of pet keepers in a country), but there’s a need for a close estimate, especially when dealing with a relatively small or easy to measure groups of people.

SUGGESTED  Focus Groups: How they Work, and Major Considerations

As demonstrated through the calculation below, a sample size of about 385 will give you a sufficient sample size to draw assumptions of nearly any population size at the 95% confidence level with a 5% margin of error, which is why samples of 400 and 500 are often used in research. However, if you are looking to draw comparisons between different sub-groups, for example, provinces within a country, a larger sample size is required. GeoPoll typically recommends a sample size of 400 per country as the minimum viable sample for a research project, 800 per country for conducting a study with analysis by a second-level breakdown such as females versus males, and 1200+ per country for doing third-level breakdowns such as males aged 18-24 in Nairobi.

How to Calculate Sample Size

As we have defined all the necessary terms, let us briefly learn how to determine the sample size using a sample calculation formula known as Andrew Fisher’s Formula.

Confidence level z-score
80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58
  1. Put these figures into the sample size formula to get your sample size.

What are the factors to consider in determining the appropriate sample size for a study?

Here is an example calculation:

Say you choose to work with a 95% confidence level, a standard deviation of 0.5, and a confidence interval (margin of error) of ± 5%, you just need to substitute the values in the formula:

((1.96)2 x .5(.5)) / (.05)2

(3.8416 x .25) / .0025

.9604 / .0025

384.16

Your sample size should be 385.

Fortunately, there are several available online tools to help you with this calculation. Here’s an online sample calculator from Easy Calculation. Just put in the confidence level, population size, the confidence interval, and the perfect sample size is calculated for you.

SUGGESTED  Research Panels: Benefits and Considerations

GeoPoll’s Sampling Techniques

With the largest mobile panel in Africa, Asia, and Latin America, and reliable mobile technologies, GeoPoll develops unique samples that accurately represent any population. See our country coverage here, or contact our team to discuss your upcoming project.