Why is 30 the minimum sample size?

One of the most common questions I get asked by people doing surveys in international development is “how big should my sample size be?”. While there are many sample size calculators and statistical guides available, those who never did statistics at university (or have forgotten it all) may find them intimidating or difficult to use.

If this sounds like you, then keep reading. This guide will explain how to choose a sample size for a basic survey without any of the complicated formulas. For more easy rules of thumb regarding sample sizes for other situations, I highly recommend Sample size: A rough guide by Ronán Conroy and  The Survey Research Handbook by Pamela Alreck and Robert Settle.

This article is a short introduction to the topic for a more in-depth coverage of the topic consider enrolling in the free online course offered by University of Florida.

This advice is for:

  • Basic surveys such as feedback forms, needs assessments, opinion surveys, etc. conducted as part of a program.
  • Surveys that use random sampling.

This advice is NOT for:

  • Research studies conducted by universities, research firms, etc.
  • Complex or very large surveys, such as national household surveys.
  • Surveys to compare between an intervention and control group or before and after a program (for this situation Sample size: A rough guide).
  • Surveys that use non-random sampling, or a special type of sampling such as cluster or stratified sampling (for these situations see Sample size: A rough guide and the UN guidelines on household surveys).
  • Surveys where you plan to use fancy statistics to analyse the results, such as multivariate analysis (if you know how to do such fancy statistics then you should already know how to choose a sample size).

The minimum sample size is 100

Most statisticians agree that the minimum sample size to get any kind of meaningful result is 100. If your population is less than 100 then you really need to survey all of them.

A good maximum sample size is usually 10% as long as it does not exceed 1000

A good maximum sample size is usually around 10% of the population, as long as this does not exceed 1000. For example, in a population of 5000, 10% would be 500. In a population of 200,000, 10% would be 20,000. This exceeds 1000, so in this case the maximum would be 1000.

Even in a population of 200,000, sampling 1000 people will normally give a fairly accurate result. Sampling more than 1000 people won’t add much to the accuracy given the extra time and money it would cost.

Choose a number between the minimum and maximum depending on the situation

Suppose that you want to survey students at a school which has 6000 pupils enrolled. The minimum sample would be 100. This would give you a rough, but still useful, idea about their opinions. The maximum sample would be 600, which would give you a fairly accurate idea about their opinions.

Choose a number closer to the minimum if:

  • You have limited time and money.
  • You only need a rough estimate of the results.
  • You don’t plan to divide the sample into different groups during the analysis, or you only plan to use a few large subgroups (e.g. males / females).
  • You think most people will give similar answers.
  • The decisions that will be made based on the results do not have significant consequences.

Choose a number closer to the maximum if:

  • You have the time and money to do it.
  • It is very important to get accurate results.
  • You plan to divide the sample into many different groups during the analysis (e.g. different age groups, socio-economic levels, etc).
  • You think people are likely to give very different answers.
  • The decisions that will be made based on the results of the survey are important, expensive or have serious consequences.

In practice most people normally want the results to be as accurate as possible, so the limiting factor is usually time and money. In the example above, if you had the time and money to survey all 600 students then that will give you a fairly accurate result. If you don’t have enough time or money then just choose the largest number that you can manage, as long as it’s more than 100.

If you would like to learn more about Survey Data Collection consider taking the free course offered by University of Michigan and University of Maryland. Enroll here.

If you want to be a bit more scientific then use this table

While the previous rules of thumb are perfectly acceptable for most basic surveys, sometimes you need to sound more “scientific” in order to be taken seriously. In that case you can use the following table. Simply choose the column that most closely matches your population size. Then choose the row that matches the level of error you’re willing to accept in the results.

Why is 30 the minimum sample size?

You will see on this table that the smallest samples are still around 100, and the biggest sample (for a population of more than 5000) is still around 1000. The same general principles apply as before – if you plan to divide the results into lots of sub-groups, or the decisions to be made are very important, you should pick a bigger sample.

Note: This table can only be used for basic surveys to measure what proportion of the population have a particular characteristic (e.g. what proportion of farmers are using fertiliser, what proportion of women believe myths about family planning, etc). It can’t be used if you are trying to compare two groups (e.g. control versus intervention) or two points in time (e.g. baseline and endline surveys). See Sample size: A rough guide for other tables that can be used in these cases.

Relax and stop worrying about the formulas

It’s a dirty little secret among statisticians that sample size formulas often require you to have information in advance that you don’t normally have. For example, you typically need to know (in numerical terms) how much the answers in the survey are likely to vary between individuals (if you knew that in advance then you wouldn’t be doing a survey!).

So even though it’s theoretically possible to calculate a sample size using a formula, in many cases experts still end up relying rules of thumb plus a good deal of common sense and pragmatism. That means you shouldn’t worry too much if you can’t use fancy maths to choose your sample size – you’re in good company.

Once you’ve chosen a sample size, don’t forget to write good survey questions, design the survey form properly and pre-test and pilot your questionnaire.

Photo by James Cridland

Fair enough – let’s go with Ms. Andrews and start at the beginning.

*Key points*

1.The bald statement: “You only need a sample size of X to do certain tests”, without any reference to the context of the problem (that is whatever it is that you are trying to do) is rubbish.

2.The actual sample size needed for any given effort will be driven by things such as the time needed to obtain a given sample, the cost of that sample, the effort needed to get the sample, the size of the effect you want to observe with a given sample size, the degree of certainty you wish to have with respect to any claims you might want to make concerning the observed effect size, etc.

3.What constitutes an effect size will depend on the question you are asking. For example:

a.If the focus is on some continuous population measure and you are interested in how the two populations may differ with respect to that measure then the effect size will often be expressed in terms of the minimum difference in the measurement mean values between the two populations that result in a significant difference with some degree of certainty. b.If the focus is on a difference of proportions of the occurrence of something such as a defect count (yes/no) then the effect size will often be expressed as the minimum difference in percentage of occurrence that result in a significant difference with some degree of certainty.

c.If the focus is on determining the confidence bounds of a measurement of a mean or of a percentage then the effect size will be the degree of confidence associated with that measure and the degree of confidence you wish to have with respect to your assessment of the confidence bounds around that target mean or percentage.

**The claim that 30 samples is sufficient**

Let’s turn to your particular situation and see how this holds up.

In your situation you have the following:

1.You have millions of bills which are processed annually. 2.You have no idea of the proportion of these bills which are incorrect.

3.Your guestimates range from 100 to 10,000 incorrect billings (yes, I know, it could be more it could be less but we need to start somewhere). In other words you believe the proportions of incorrect billings are very small.

For purposes of estimation let’s assume you process exactly 1,000,000 bills annually and the error rates range from 100 to 10,000 as previously stated. This would mean your guestimate of defect rate is between .01% and 1%.

If we take a random sample size of 30 bills, the smallest non-zero defect we can detect would be a single error. This translates into 1/30 = .033 for a defect rate of 3.3%. This, in turn, means the guestimated defect rates (.01% and 1%) range from over 300 to 3.3 times SMALLER than the smallest possible non-zero defect rate you could detect with 30 samples – in other words – a sample size of 30 will provide 0 information about whether your guestimates are correct or incorrect.

What the above illustrates is there are many situations where a sample size of 30 is grossly insufficient. In my earlier posts I illustrated situations where a sample size of 30 was gross overkill. Taken together they illustrate that problem context is everything. Without context claims concerning either the necessity or sufficiency of sample sizes of 30, or any other number for that matter, are of no value.

As to where the 30 sample size estimate came from, my answer as a statistician is I don’t know and I don’t care – given the work I do and have done for many years it is an estimate of no value.

**What you have done with the online sample size calculator**

I found a couple of sample size calculators which gave me the same number you generated – 384/385.

The one I used can be found here:

http://epitools.ausvet.com.au/content.php?page=1Proportion&Proportion=0.5&Precision=0.05&Conf=0.95&Population=100000

What you have done is the following:
You are assuming you have an error rate of 50%. You told the machine you wanted to be 95% certain that this estimate is good to within +- 5%. As you can see this particular online site only allows a maximum population size of 100,000 and, as you can also see, the sample size estimate is in agreement with what you stated earlier.

If you want a sample size estimate to test to see if your error rate is .01% and if you want that to be precise within 5% then the numbers you would enter in the above online calculator would be an estimated true proportion of .0001 and a desired precision of .000005 (5% of .0001) For these values the sample size would be 99354 – basically your entire population of 100,000.

For the case of 1% the values would be .01 and .0005 respectively and the sample size for a population of 100,000 would be 60,337.

The fact of large sample sizes for small defect estimates with a high degree of precision is to be expected and it was numbers of this magnitude I was getting when I tried to generate sample sizes based on my understanding of your earlier posts.

I know very little about billing procedures and billing record keeping but I find it hard to believe that someone, somewhere in your organization doesn’t keep track of incorrect billing as well as number of bills processed. If not the exact numbers then possibly other numbers that could stand in as a surrogate for defects and totals and could be used to generate an estimate of proportion defective.

Regardless of what you might be able to use for generating an estimate of percent defective just calculating that estimate would give you a sense of the magnitude of the problem. Depending on what you found I would think the next question one should ask is this: Given our crude estimate of an error rate – what is the cost of that error rate to our business and is there any benefit with respect to dollars to the bottom line to try to further reduce the error rate?

If it turns out your error rate is very low and reducing it further would really be of benefit then you will have to explain the sample size situation to everyone and, in that case, I hope what I have provided will be of some value.