What is the sampling distribution of a statistic the distribution of all values taken by it in all possible samples the average of all the values taken by statistic?

Chapter 9
    Sec 9.1
Before we begin the topics in Chapter 9 it would be useful to revisit "Why Statistics"  from the very beginning of the year.

The concepts we learn in Chapter 9 are critical to set the stage for studying the final component of statistical analysis - statistical inference - which asks and answers the question "How often would this method give a correct answer if I used it very many times?"  Inference is most secure when we produce data by random sampling or randomized comparative experiments.  The reason is that when we use chance to choose respondents or assign subjects, the laws of probability answer the question stated above.  We will prepare for the study of statistical inference by looking at the probability distributions of some very common statistics:  sample proportions and sample means.

When looking at data we MUST keep straight whether a number describes a sample or a population.
A parameter is a number that describes the population.  A parameter is a fixed number but we do not know its value because we cannot examine the entire population.  A statistic is a number that describes a sample.  The value of a statistic is known when we have taken a sample, but it can change from sample to sample.  We use a statistic to estimate an unknown parameter.  We use p to represent a population proportion while we use p hat, the sample proportion, to estimate the parameter.  Each sample will have its own unique statistic ie., sample statistics will vary.  BUT...this is not fatal...what happens if we take MANY samples??

The sampling distribution (histogram) of a statistic is the distribution of values taken by the statistic in ALL possible samples of the same size from the same population.  The interpretation of a sampling distribution is the same, whether we obtain it by simulation or by the mathematics of probability.

We can use the same tools of data analysis used in beginning chapters to describe any distribution.  Using a histogram of the sampling distribution will provide the overall shape, measure of center and spread, and information about any outliers.  The appearance of the approximate sampling distributions is a consequence of random sampling.  When randomization is used in a design for producing data, statistics computed from the data have a definite pattern of behavior over many repetitions, even though the result of a single repetition is uncertain.

Of course we need to ask how trustworthy a statistic is as an estimate of a parameter.  Sampling distributions allow us to describe bias more precisely by speaking of the bias of a statistic rather than bias in a sampling method.  Bias concerns the center of the sampling distribution.  A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is exactly equal to the true value of the parameter being estimated.  The sample proportion (p hat) from an SRS is an unbiased estimator of the population proportion p.
Statistics have variability but very large samples produce less variability then small samples.  An IMPORTANT fact is that the spread of the sampling distribution does NOT depend very much on the size of the population.

The variability of a statistic is described by the spread of its sampling distribution.  This spread is determined by the sampling design and the size of the sample.  Large samples give smaller spread.  As long as the population is much larger than the sample (at least 10 times larger) the spread of the sampling distribution is approximately the same for any population size.

Imagining the true value of the population parameter as the bull's eye on a target and the sample statistic as an arrow fixed at the target we can explain bias and variability pictorially.  Both describe what happens when we take many shots at the target.  Bias means that the aim is off and we consistently MISS the bull's eye in the same direction.  The sample values do NOT center on the population value.  High variability means that repeated shots are widely scattered on the target.  Repeated samples do NOT give very similar results.  Properly chosen statistics computed from random samples of sufficient size will have low bias and low variability.

What Is a Sampling Distribution?

A sampling distribution is a probability distribution of a statistic obtained from a larger number of samples drawn from a specific population. The sampling distribution of a given population is the distribution of frequencies of a range of different outcomes that could possibly occur for a statistic of a population.

In statistics, a population is the entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements. A population can thus be said to be an aggregate observation of subjects grouped together by a common feature.

  • A sampling distribution is a probability distribution of a statistic that is obtained through repeated sampling of a specific population.
  • It describes a range of possible outcomes for a statistic, such as the mean or mode of some variable, of a population.
  • The majority of data analyzed by researchers are actually samples, not populations.

Understanding Sampling Distribution

A lot of data drawn and used by academicians, statisticians, researchers, marketers, analysts, etc. are actually samples, not populations. A sample is a subset of a population. For example, a medical researcher that wanted to compare the average weight of all babies born in North America from 1995 to 2005 to those born in South America within the same time period cannot draw the data for the entire population of over a million childbirths that occurred over the ten-year time frame within a reasonable amount of time. They will instead only use the weight of, say, 100 babies, in each continent to make a conclusion. The weight of 100 babies used is the sample and the average weight calculated is the sample mean.

Now suppose that instead of taking just one sample of 100 newborn weights from each continent, the medical researcher takes repeated random samples from the general population, and computes the sample mean for each sample group. So, for North America, they pull up data for 100 newborn weights recorded in the U.S., Canada, and Mexico as follows: four 100 samples from select hospitals in the U.S., five 70 samples from Canada, and three 150 records from Mexico, for a total of 1,200 weights of newborn babies grouped in 12 sets. They also collect a sample data of 100 birth weights from each of the 12 countries in South America.

Each sample has its own sample mean, and the distribution of the sample means is known as the sample distribution.

The average weight computed for each sample set is the sampling distribution of the mean. Not just the mean can be calculated from a sample. Other statistics, such as the standard deviation, variance, proportion, and range can be calculated from sample data. The standard deviation and variance measure the variability of the sampling distribution.

The number of observations in a population, the number of observations in a sample, and the procedure used to draw the sample sets determine the variability of a sampling distribution. The standard deviation of a sampling distribution is called the standard error. While the mean of a sampling distribution is equal to the mean of the population, the standard error depends on the standard deviation of the population, the size of the population, and the size of the sample.

Knowing how spread apart the mean of each of the sample sets are from each other and from the population mean will give an indication of how close the sample mean is to the population mean. The standard error of the sampling distribution decreases as the sample size increases.

Special Considerations

A population or one sample set of numbers will have a normal distribution. However, because a sampling distribution includes multiple sets of observations, it will not necessarily have a bell-curved shape.

Following our example, the population average weight of babies in North America and in South America has a normal distribution because some babies will be underweight (below the mean) or overweight (above the mean), with most babies falling in between (around the mean). If the average weight of newborns in North America is seven pounds, the sample mean weight in each of the 12 sets of sample observations recorded for North America will be close to seven pounds as well.

However, if you graph each of the averages calculated in each of the 1,200 sample groups, the resulting shape may result in a uniform distribution, but it is difficult to predict with certainty what the actual shape will turn out to be. The more samples the researcher uses from the population of over a million weight figures, the more the graph will start forming a normal distribution.

What is a sampling distribution in statistics?

A sampling distribution is a probability distribution of a statistic that is obtained through repeated sampling of a specific population. It describes a range of possible outcomes for a statistic, such as the mean or mode of some variable, of a population.

What is the sampling distribution of a statistic the distribution of all values taken by it in all possible samples?

The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. Students often find this a hard concept. The idea that we might have to list and study "all possible samples" is mind-boggling.

What is the distribution of values taken by a statistic in all possible samples of the same size from the same population called?

The sampling distribution (histogram) of a statistic is the distribution of values taken by the statistic in ALL possible samples of the same size from the same population. The interpretation of a sampling distribution is the same, whether we obtain it by simulation or by the mathematics of probability.

What is a probability distribution for all possible values of a sample statistic?

Sampling distribution: the sampling distribution of a statistic is a probability distribution for all possible values of the statistic computed from a sample of size n. and standard deviation .