What is the process of gathering data or information to solve a particular or specific problem in a scientific?

What is Research?

Data collection is the process of gathering data for use in business decision-making, strategic planning, research and other purposes. It's a crucial part of data analytics applications and research projects: Effective data collection provides the information that's needed to answer questions, analyze business performance or other outcomes, and predict future trends, actions and scenarios.

In businesses, data collection happens on multiple levels. IT systems regularly collect data on customers, employees, sales and other aspects of business operations when transactions are processed and data is entered. Companies also conduct surveys and track social media to get feedback from customers. Data scientists, other analysts and business users then collect relevant data to analyze from internal systems, plus external data sources if needed. The latter task is the first step in data preparation, which involves gathering data and preparing it for use in business intelligence (BI) and analytics applications.

For research in science, medicine, higher education and other fields, data collection is often a more specialized process, in which researchers create and implement measures to collect specific sets of data. In both the business and research contexts, though, the collected data must be accurate to ensure that analytics findings and research results are valid.

Organizations collect data from a variety of systems and other data sources.

What are different methods of data collection?

Data can be collected from one or more sources as needed to provide the information that's being sought. For example, to analyze sales and the effectiveness of its marketing campaigns, a retailer might collect customer data from transaction records, website visits, mobile applications, its loyalty program and an online survey. 

The methods used to collect data vary based on the type of application. Some involve the use of technology, while others are manual procedures. The following are some common data collection methods:

  • automated data collection functions built into business applications, websites and mobile apps;
  • sensors that collect operational data from industrial equipment, vehicles and other machinery;
  • collection of data from information services providers and other external data sources;
  • tracking social media, discussion forums, reviews sites, blogs and other online channels;
  • surveys, questionnaires and forms, done online, in person or by phone, email or regular mail;
  • focus groups and one-on-one interviews; and
  • direct observation of participants in a research study.
These are some of the methods that organizations use to collect customer data.

What are common challenges in data collection?

Some of the challenges often faced when collecting data include the following:

  • Data quality issues. Raw data typically includes errors, inconsistencies and other issues. Ideally, data collection measures are designed to avoid or minimize such problems. That isn't foolproof in most cases, though. As a result, collected data usually needs to be put through data profiling to identify issues and data cleansing to fix them.
  • Finding relevant data. With a wide range of systems to navigate, gathering data to analyze can be a complicated task for data scientists and other users in an organization. The use of data curation techniques helps make it easier to find and access data. For example, that might include creating a data catalog and searchable indexes.
  • Deciding what data to collect. This is a fundamental issue both for upfront collection of raw data and when users gather data for analytics applications. Collecting data that isn't needed adds time, cost and complexity to the process. But leaving out useful data can limit a data set's business value and affect analytics results.
  • Dealing with big data. Big data environments typically include a combination of structured, unstructured and semistructured data, in large volumes. That makes the initial data collection and processing stages more complex. In addition, data scientists often need to filter sets of raw data stored in a data lake for specific analytics applications.
  • Low response and other research issues. In research studies, a lack of responses or willing participants raises questions about the validity of the data that's collected. Other research challenges include training people to collect the data and creating sufficient quality assurance procedures to ensure that the data is accurate.

What are the key steps in the data collection process?

Well-designed data collection processes include the following steps:

  1. Identify a business or research issue that needs to be addressed and set goals for the project.
  2. Gather data requirements to answer the business question or deliver the research information.
  3. Identify the data sets that can provide the desired information.
  4. Set a plan for collecting the data, including the collection methods that will be used.
  5. Collect the available data and begin working to prepare it for analysis.

Data collection considerations and best practices

There are two primary types of data that can be collected: quantitative data and qualitative data. The former is numerical -- for example, prices, amounts, statistics and percentages. Qualitative data is descriptive in nature -- e.g., color, smell, appearance and opinion.

Organizations also make use of secondary data from external sources to help drive business decisions. For example, manufacturers and retailers might use U.S. census data to aid in planning their marketing strategies and campaigns. Companies might also use government health statistics and outside healthcare studies to analyze and optimize their medical insurance plans.

The European Union's General Data Protection Regulation (GDPR) and other privacy laws enacted in recent years make data privacy and security bigger considerations when collecting data, particularly if it contains personal information about customers. An organization's data governance program should include policies to ensure that data collection practices comply with laws such as GDPR.

Other data collection best practices include the following:

  • Make sure you collect the right data to meet business or research needs.
  • Ensure that the data is accurate, either as it's collected or as part of the data preparation process.
  • Don't waste time and resources collecting irrelevant data.

The scientific method is the process of objectively establishing facts through testing and experimentation. The basic process involves making an observation, forming a hypothesis, making a prediction, conducting an experiment and finally analyzing the results. The principals of the scientific method can be applied in many areas, including scientific research, business and technology.

Steps of the scientific method

The scientific method uses a series of steps to establish facts or create knowledge. The overall process is well established, but the specifics of each step may change depending on what is being examined and who is performing it. The scientific method can only answer questions that can be proven or disproven through testing.

Make an observation or ask a question. The first step is to observe something that you would like to learn about or ask a question that you would like answered. These can be specific or general. Some examples would be "I observe that our total available network bandwidth drops at noon every weekday" or "How can we increase our website registration numbers?" Taking the time to establish a well-defined question will help you in later steps.

Gather background information. This involves doing research into what is already known about the topic. This can also involve finding if anyone has already asked the same question.

Create a hypothesis. A hypothesis is an explanation for the observation or question. If proven later, it can become a fact. Some examples would be "Our employees watching online videos during lunch is using our internet bandwidth" or "Our website visitors don't see our registration form."

Create a prediction and perform a test. Create a testable prediction based on the hypothesis. The test should establish a noticeable change that can be measured or observed using empirical analysis. It is also important to control for other variables during the test. Some examples would be "If we block video-sharing sites, our available bandwidth will not go down significantly during lunch" or "If we make our registration box bigger, a greater percentage of visitors will register for our website than before the change."

Analyze the results and draw a conclusion. Use the metrics established before the test see if the results match the prediction. For example, "After blocking video-sharing sites, our bandwidth utilization only went down by 10% from before; this is not enough of a change to be the primary cause of the network congestion" or "After increasing the size of the registration box, the percent of sign-ups went from 2% of total page views to 5%, showing that making the box larger results in more registrations."

Share the conclusion or decide what question to ask next: Document the results of your experiment. By sharing the results with others, you also increase the total body of knowledge available. Your experiment may have also led to other questions, or if your hypothesis is disproven you may need to create a new one and test that. For example, "Because user activity is not the cause of excessive bandwidth use, we now suspect that an automated process is running at noon every day."

Diagram illustrating using the scientific method to confirm a hypothesis

Using the scientific method in technology and computers

The scientific method is incredibly valuable in technology and related fields. It is obviously used in research and development, but it is also useful in day-to-day operations. Because almost everything can be quantified, testing hypotheses can be easy.

Most modern computer systems are complicated and difficult to troubleshoot. Using the scientific method of hypothesis and testing can greatly simplify the process of tracking down errors and it can help find areas of improvement. It can also help when you evaluate new technologies before implementation.

Using the scientific method in business

Many business processes benefit when using the scientific method. Shifting business landscapes and complex business relationships can make behaviors hard to predict or act counter to previous history. Instead of using gut feelings or previous experience, a scientific approach can help businesses grow. Big data initiative can make business information more available and easier to test with.

The scientific method can be applied in many areas. Customer satisfaction and retention numbers can be analyzed and tested upon. Profitability and finance numbers can be analyzed to form new conclusions. Making predictions on changing business practices and checking the results will help to identify and measure success or failure of the initiatives.

Using the scientific method in business

Common pitfalls in using the scientific method

The scientific method is a powerful tool. Like any tool, though, if it is misused it can cause more damage than good.

The scientific method can only be used for testable phenomenon. This is known as falsifiability. While much in nature can be tested and measured, some areas of human experience are beyond objective observation.

Both proving and disproving the hypothesis are equally valid outcomes of testing. It is possible to ignore the outcome or inject bias to skew the results of a test in a way that will fit the hypothesis. Data in opposition to the hypothesis should not be discounted.

It is important to control for other variables and influences during testing to not skew the results. While difficult, not accounting for these could produce invalid data. For example, testing bandwidth during a holiday or measuring registrations during a sale event may introduce other factors that influence the outcome.

Another common pitfall is mixing correlation with causation. While two data points may seem to be connected, it is not necessarily true that once is directly influenced by the other. For example, an ice cream stand in town sees drops in business on the hottest days. While the data may look like the hotter the weather, the less people want ice cream, the reality is that more people are going to the beach on those days and less are in town.

History of the scientific method

The discovery of the scientific method is not credited to any single person, but there are a few notable figures who contributed to its development.

The Greek philosopher Aristotle is considered to be one of the earliest proponents of logic and cycles of observation and deduction in recorded history. Ibn al-Haytham, a mathematician, established stringent testing methodologies in pursuit of facts and truth, and he recorded his findings.

During the Renaissance, many thinkers and scientists continued developing rational methods of establishing facts. Sir Francis Bacon emphasized the importance of inductive reasoning. Sir Isaac Newton relied on both inductive and deductive reasoning to explain the results of his experiments, and Galileo Galilei emphasized the idea that results should be repeatable.

Other well-known contributors to the scientific method include Karl Popper, who introduced the concept of falsifiability, and Charles Darwin, who is known for using multiple communication channels to share his conclusions.

See also: falsifiability, pseudoscience, empirical analysis, validated learning, OODA loop, black swan event, deep learning.