Most research projects can be classified into one of two basic categories: observational studies or designed experiments. In an experiment, researchers control (to some extent) the conditions under which measurements are made. In an observational study, researchers simply observe what happens, without controlling the conditions under which measurements are made. Both types of study follow the five steps of the Statistical Process. Show Designed ExperimentsIn a designed experiment, researchers manipulate the conditions that the participants experience. They often do this by randomly assigning subjects to one of two groups, a “treatment” group and a “control” group. The experiment is conducted by applying some kind of treatment to the subjects in the treatment group and observing the effect of the treatment. Those in the control group do not receive the treatment and are also observed. In this way researchers can determine the effects of the treatment. The following example illustrates the use of these two groups. Jonas Salk’s First Polio Vaccine Trial Beginning around 1916 and through the 1950s, a mysterious plague attacked infants and children. Symptoms included excruciating muscle pain and a stiff neck. This illness, which became known as poliomyelitis or simply “polio,” left children disfigured, paralyzed, and sometimes even dead. While working as a researcher at the University of Pittsburgh School of Medicine, Dr. Jonas E. Salk developed a vaccine that might help prevent the spread of this disease. He conducted what has become one of the most famous designed experiments in history. This short video below provides a compelling summary of the famous Jonas Salk vaccine experiment. As you watch, notice each of the 5 steps of a statistical study in this study. As explained in the video, in the first Salk trial almost 1.1 million children participated in the study. Even though the sample size was large, flaws in the study design rendered the results useless. Undaunted, Dr. Salk fixed the problems with the design and enrolled hundreds of thousands of additional children for the second phase of his study. In all, over 1.8 million infants and children participated in this experiment, making it the largest drug trial to date. Step 1: Design the study. The participants in a study are commonly called subjects. Sometimes subjects are called experimental units or simply units. In the Salk trials, the children who participated were the subjects. Subjects (the children) were randomly assigned to one of two groups. The first group was given the experimental vaccine, the treatment. The treatment is the new or experimental condition that is imposed on the subjects. The subjects who receive the treatment make up the treatment group. The second group was given a control or placebo. In this study, the control was an injection that looked just like the vaccine, but contained a harmless saline solution. The control group or placebo group is made up of the subjects assigned to receive the control. This study was double blind. Neither the children’s parents nor their doctors knew whether a particular child received the treatment or the control. Both parties were blinded to this information. Because the children were assigned to the groups randomly, the two groups should be similar. If the vaccine is not effective, the number of future cases of polio should be about the same in each group. However, if Salk’s vaccine helped to prevent the spread of polio, then fewer cases should occur in the vaccinated group. Answer the following questions:
Step 2: Collect data. The researchers followed up with each child to determine if they contracted polio. They recorded the number of children in each group that developed polio during the study period. Not all of Salk’s experiments were double-blind. Here is a summary of the results from the regions where a double-blind study was conducted (Francis et al., 1955; Brownlee, 1955): Children Who Developed PolioTreatment Group 57 200,688 200,745 Placebo Group 142 201,087 201,229 Step 3: Describe the data. One way to summarize the data is to compute the proportion of children in each group that developed polio. The proportion of children in the treatment group that developed polio during the study period is: \[ \frac{57}{200745} = 0.000~283~9 \] Answer the following questions:
Show/Hide Solution
Step 4: Make inferences Careful statistical analysis of the records suggested that this difference was so great that it was attributable to the vaccine and not to chance. Assuming that the vaccine had no effect, the probability that the difference in the proportions between the two groups would be at least as extreme as the difference Dr. Salk observed was very low: 0.00000000093. Because this probability is so small, it is highly unlikely that these results are due to chance. Step 5: Take action Once it was clear that the vaccine was effective, children who were unvaccinated or had received the placebo were given Salk’s vaccine. Since 1954, there has been a marked decrease in the number of polio cases worldwide (Offit, 2005). Public health researchers are striving to eradicate this disease entirely. Observational StudiesIn an observational study researchers observe the responses of the individuals, without controlling the conditions experienced by the individuals. Therefore, they do not assign the participants to treatment or control groups. Observational studies commonly occur in business settings. One example is a financial audit. The purpose of a financial audit is to assess the accuracy of a company’s financial business practices. ImmunAvance Ltd., a non-government health care organization, hired the Accounting Office at Global Optimization Unlimited to perform an independent audit of their financial practices. ImmunAvance provides inoculation and other preventative health care services in rural African communities. Step 1: Design the study The volume of financial transactions conducted by ImmunAvance makes it impossible to conduct a census or an examination of the entire collection of ImmunAvance’s financial documents. Instead, you will collect a manageable group of items (called the sample) from the entire collection of financial documents (called the population.) A sample is a subset or a portion of a population. The information gained from the sample is used to make an inference (or generalization) about the population. Auditors typically cannot consider every item in a population, because there are too many. When it is not possible to conduct a census, auditors face sampling risk. Sampling risk is the risk affiliated with not auditing every item in the population. It is the risk that the sample may not adequately reflect the population. The only way to eliminate sampling risk is to conduct a census, which is usually not practical. Auditors can reduce sampling risk by obtaining a sample randomly. This is called random selection. Another way to reduce sampling risk is to increase the sample size, the number of items sampled. Sampling MethodsStep 2: Collect data There are several procedures that can be used to select a random sample from a population, including: simple random sampling (SRS), stratified sampling, systematic sampling, cluster sampling, , and convenience sampling (or, haphazard sampling). These are examples of sampling methods. Random Sampling MethodsA simple random sample (SRS) is the best method for obtaining a sample from a population. This method allows each possible sample of a certain size an equal chance at being selected as the chosen sample. A difficulty of this method is that a list of all of the items in the population must be accessible before the sample is taken. Often, we obtain a SRS by allowing a computer to randomly select a certain number of items from the full list of the population. It is akin to the idea of putting all of the names into a hat, shaking them up, and randomly drawing out a few. For example, suppose there are 18,000 students in the population of a certain university. School officials can use a computer to randomly choose values between 1 and 18,000 to identify which students are to be selected to complete a survey. In Excel, the command to obtain a random number between 1 and 18,000 is =RANDBETWEEN(1,18000). A simple random sample can be obtained any time there is a complete list of the items to be sampled and they are all accessible. All the statistical procedures in this course assume that simple random sampling has been used. But in practice, the SRS is often difficult (or impossible) to implement. A stratified sample is when the items to be sampled are organized in groups of homogeneous (similar) items called strata, then a simple random sample is drawn from each of these strata. Stratified sampling works well when the items are similar within each stratum and tend to differ from one stratum to another. We often use stratified sampling in order to obtain a sample in such a way that we can make comparisons between each of the groups (or strata). For example, in obtaining a sample of students from a university, school officials could define the strata as: (1) freshman, (2) sophomores, (3) juniors, and (4) seniors. A simple random sample could then be obtained from each of these strata. This would ensure that each class rank of students was represented in the sample. It would also allow the school officials to see how freshman, sophomore, junior, and senior level students compared in their answers to a survey. A systematic sample is where every \(k^{\text{th}}\) item in the population is selected to be part of the sample, beginning at a random starting point. Systematic sampling works well when the items are in a random, but sequential ordering. If the items are not arranged randomly, a systematic sample can miss important parts of the population. For example, consider a fast food company where every 10th customer is given the opportunity to compete a satisfaction survey in exchange for a small discount coupon towards their next purchase. An airport security line also often implements a procedure where every 100th (or so) person is selected for a more “in depth” security examination. Similarly, factories that use assembly lines will pull say every 500th item from the assembly line to perform a quality control check on the item. A cluster sample (sometimes called a block sample) consists of taking all items in one or more randomly selected clusters, or blocks. When the variation from one block to another is relatively low, compared to the variation within the block, cluster sampling is a reasonable way to get a sample. For example, ecologists could draw grids on a map of a forest to create small sampling regions, or sampling clusters. Then, by randomly selecting one or two of these clusters from the map, the ecologists could go to the areas marked on the map and document information on the health of every tree they find in those clusters. This is a practical way to get a sample in this case because the ecologists only have to go to a few areas of the forest, but are still able to obtain a random sample of all of the trees in the forest. It is also worth noting that the ecologists would not be interesting in comparing the health of the trees from the selected clusters to each other like they would in a stratified sample. Instead, they are just looking for a feasible way to obtain a single random sample of all of the trees in the forest, but want to keep their traveling time to a minimum while collecting their sample. In contrast, to obtain a simple random sample of trees from the same forest, the ecologists would first have to go out and number every tree in the entire forest. Then they would need to use a computer to randomly pick which trees to collect data on. Finally, they would then have to go back to the forest and collect data on the selected trees from across the entire forest. Such an approach just isn’t feasible in practice, so we are willing to settle instead for the cluster sample. A convenience sample involves selecting items that are relatively easy to obtain and does not use random selection to choose the sample. This method of sampling can be assumed to always bring bias into the sample. As an example of a convenience sample, an auditor could haphazardly select items from a filing cabinet. This is frequently done when a quick and simple sample is needed, but may not yield a sample that represents the population well. When possible, convenience samples should be avoided. Types of VariablesWhenever we collect data, we record information about the things we are studying. There are two basic types of data that can be recorded: quantitative measurements and categorical labels. We will call these types of data simply “quantitative” or “categorical” variables. We use the word “variable” to denote the idea that the quantitative measurements or categorical labels can vary from person to person, or item to item, in our study. Quantitative variables provide measurement information on each individual (or item) in our study. They represent things that are numeric in nature; things that are measured. They often include units of measurement along with the quantitative value of the measurement. For example, the heights of children measured in inches (or centimeters), or their weight measured in pounds (or kilograms). For a quantitative variable, it makes sense to apply arithmetic operations to the data (such as adding values together, computing the average of the values, or comparing two values). If one child weighs 30 pounds (13.61 kg) and a second child weights 60 pounds (27.22) then the second child is twice as heavy as the first. Categorical variables allow us to place each individual (or item) into to a specific category. Categorical variables are labels, and it does not make sense to do arithmetic with them. For example the gender of a newborn child, the ethnicity of an individual, a person’s job title, the brand of phone they own, or the area code of a telephone number, etc are all categorical variables. Notice that although a telephone number consists of numbers, it is not a quantitative measurement. It does not make sense to double someone’s phone number, to average phone numbers together, or to say one phone number is half the size of another. But the area code of the phone number gives information about the region where the phone number was first initiated, which is categorical information. In Unit 3 of this course we will learn more about categorical variables and proportions. Units 1 and 2 of this course focus on studying quantitative variables. Returning to the sample accounts receivable record, we find this data to have information on both types of variables. Answer the following question:
Step 3: Describe the data After auditors collect a sample and compile the data, they review the evidence. Auditors may use graphs or compute numbers (such as the average) to summarize the evidence they found. What are the 5 basic methods of statistical analysis?The five basic methods are mean, standard deviation, regression, hypothesis testing, and sample size determination.
What are the 4 basic elements of statistics?Sample size, variables required, numerical summary tools, and conclusions are the four elements of a descriptive statistics problem.
What is the first step in any statistical analysis?Step 1: Write your hypotheses and plan your research design. To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.
How to write a statistical analysis?Statistical Analysis: Definition, Examples. Summarize the data. For example, make a pie chart.. Find key measures of location. ... . Calculate measures of spread: these tell you if your data is tightly clustered or more spread out. ... . Make future predictions based on past behavior. ... . Test an experiment's hypothesis.. |