Fill in the blank: in sql databases, the _____ function can be used to convert data from one datatype to another.

Recommended textbook solutions

Fill in the blank: in sql databases, the _____ function can be used to convert data from one datatype to another.

Information Technology Project Management: Providing Measurable Organizational Value

5th EditionJack T. Marchewka

346 solutions

Fill in the blank: in sql databases, the _____ function can be used to convert data from one datatype to another.

Fundamentals of Database Systems

7th EditionRamez Elmasri, Shamkant B. Navathe

687 solutions

Fill in the blank: in sql databases, the _____ function can be used to convert data from one datatype to another.

Introduction to Algorithms

3rd EditionCharles E. Leiserson, Clifford Stein, Ronald L. Rivest, Thomas H. Cormen

726 solutions

Fill in the blank: in sql databases, the _____ function can be used to convert data from one datatype to another.

Service Management: Operations, Strategy, and Information Technology

7th EditionJames Fitzsimmons, Mona Fitzsimmons

103 solutions

Coursera Google Data Analytics Professional Certificate Course 4 – Process Data from Dirty to Clean quiz answers to all weekly questions (weeks 1 – 6):

  • Week 1: The importance of integrity
  • Week 2: Sparkling-clean data
  • Week 3: Cleaning data with SQL
  • Week 4: Verify and report on your cleaning results
  • Week 5: Optional: Adding data to your resume
  • Week 6: Course challenge

You may also be interested in Google Data Analytics Professional Certificate Course 1: Foundations – Cliffs Notes.


Week 1: The importance of integrity

Before you clean, check for integrity

As you start thinking about how to prepare your data for exploration, this part of the course will highlight why data integrity is so essential to successful decision-making. You’ll learn about how data is generated and the techniques analysts use to decide what data to collect for analysis. And you’ll discover structured and unstructured data, data types, and data formats.

Learning Objectives

  • Describe statistical measures associated with data integrity including statistical power, hypothesis testing, and margin of error
  • Describe strategies that can be used to address insufficient data
  • Discuss the importance of sample size with reference to sample bias and random samples
  • Describe the relationship between data and related business objectives
  • Define data integrity with reference to types and risks
  • Discuss the importance of pre-cleaning activities

Answers to week 1 quiz questions

L2 Maintaining data integrity

Question 1

Which process do data analysts use to make data more organized and easier to read?

  • Data transfer
  • Data manipulation
  • Data uniformity
  • Data replication

To make data more organized and easier to read, data analysts use data manipulation.

Question 2

Fill in the blank: The degree to which data conforms to certain business rules or constraints determines the data’s _.

  • structure
  • validity
  • completeness
  • range

The degree to which data conforms to certain business rules or constraints determines the data’s validity.

Question 3

Which of the following is an example of invalid data?

  • A mandatory value that has been left blank
  • Values for two customers with the same first initial but different last names
  • A string data type containing more than one word
  • A value that equals the last number in a data range

A mandatory value left blank is invalid because mandatory values must be filled in.

L3 Connect data to objectives

Question 1

Fill in the blank: Data being used for analysis should align with _ and help answer stakeholder questions.

  • project limitations
  • current trends
  • obsolete projects
  • business objectives

Data being used for analysis should align with business objectives and help answer stakeholder questions.

Question 2

Before analysis, a company collects data from countries that use different date formats. Which of the following updates would improve the data integrity?

  • Remove data in an unfamiliar date format
  • Change all of the dates to the same format
  • Leave the dates in their current formats
  • Organize the data by country

Changing all of the dates to the same format would improve the data integrity.

Question 3

When should data analysts think about modifying a business objective? Select all that apply.

  • When the data doesn’t align with the original objective
  • When they find a row of duplicate data
  • When the analysis is taking longer than expected
  • When there is not enough data to meet the objective

Data analysts should think about modifying a business objective when the data doesn’t align with the original objective and when there is not enough data to meet the objective.

L4 When to stop collecting data

Question 1

What should an analyst do if they do not have the data needed to meet a business objective? Select all that apply.

  • Continue with the analysis using data from less reliable sources.
  • Perform the analysis by finding and using proxy data from other datasets.
  • Create and use hypothetical data that aligns with analysis predictions.
  • Gather related data on a small scale and request additional time to find more complete data.

If an analyst does not have the data needed to meet a business objective, they should gather related data on a small scale and request additional time. Then, they can find more complete data or perform the analysis by finding and using proxy data from other datasets.

Question 2

Which of the following are limitations that might lead to insufficient data? Select all that apply.

  • Duplicate data
  • Data from a single source
  • Outdated data
  • Data that updates continually

Limitations that might lead to insufficient data include data that updates continually, outdated data, and data from a single source.

Question 3

How can a data analyst eliminate sampling bias of a population for a study about the most popular ice cream flavors?

  • Random sampling
  • Job-based sampling
  • Geographical sampling
  • Gender sampling

To eliminate sampling bias of a population for this study, a data analyst can use random sampling. Sampling on the basis of geographical location can still lead to sampling bias.

Question 4

A data analyst wants to find out how many people in Utah have swimming pools. It’s unlikely that they can survey every Utah resident. Instead, they survey enough people to be representative of the population. This describes what data analytics concept?

  • Margin of error
  • Statistical significance
  • Sample
  • Confidence level

This describes a sample, which is a part of a population that is representative of the whole.

L5 Testing your data

Question 1

A research team runs an experiment to determine if a new security system is more effective than the previous version. What type of results are required for the experiment to be statistically significant?

  • Results that are real and not caused by random chance
  • Results that are hypothetical and in need of more testing
  • Results that are inaccurate and should be ignored
  • Results that are unlikely to occur again

In order for an experiment to be statistically significant, the results should be real and not caused by random chance.

Question 2

In order to have a high confidence level in a customer survey, what should the sample size accurately reflect?

  • The predictions of stakeholders
  • The most valuable members of the population
  • The trends from other customer surveys
  • The entire population

In order to have a high confidence level in a customer survey, the sample size should accurately reflect the entire population.

Question 3

A data analyst determines an appropriate sample size for a survey. They can check their work by making sure the confidence level percentage plus the margin of error percentage add up to 100%.

  • True
  • False

The confidence level percentage and margin of error percentage do not have to add up to 100%. They are independent of each other.

L6 Consider the margin of error

Question 1

Fill in the blank: Margin of error is the _ amount that the sample results are expected to differ from those of the actual population.

  • median
  • minimum
  • average
  • maximum

Margin of error is the maximum amount that the sample results are expected to differ from those of the actual population.

Question 2

In a survey about a new cleaning product, 75% of respondents report they would buy the product again. The margin of error for the survey is 5%. Based on the margin of error, what percentage range reflects the population’s true response?

  • Between 70% and 80%
  • Between 75% and 80%
  • Between 73% and 78%
  • Between 70% and 75%

Based on the margin of error, between 70% and 80% accurately reflects the population’s true response.

Weekly challenge 1

Question 1

Which of the following conditions are necessary to ensure data integrity? Select all that apply.

  • Statistical power
  • Completeness
  • Accuracy
  • Privacy

Accuracy and completeness are necessary to ensure data integrity.

Question 2

What is one potential problem associated with data manipulation that analysts must be aware of?

  • Data manipulation can help organize a dataset.
  • Data manipulation can separate a dataset among different locations.
  • Data manipulation can make a dataset easier to read.
  • Data manipulation can introduce errors.

Data manipulation is the process of changing data to make it more organized and easier to read. However, it can sometimes introduce errors.

Question 3

A data analyst is given a dataset for analysis. It includes data about the total population of every country in the previous 20 years. Based on the available data, an analyst will be able to determine which country was the most populous from 2016 to 2017.

  • True
  • False

Based on the available data, an analyst will be able to determine which country was the most populous from 2016 to 2017.

Question 4

A data analyst is given a dataset for analysis.

June 2014 Invoices – Sheet1.csv

Which of the following has duplicate data?

  • Data for Valando on 2/18/2014
  • Data for Valando on 1/1/2014
  • Data for Symteco on 5/20/2014
  • Data for Symteco on 2/21/2014

Valando on 2/18/2014 contains duplicate data because the spreadsheet contains the same data in two different rows.

Question 5

A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decide to generate new data that represents all continents. What type of insufficient data does this scenario describe?

  • Data that keeps updating
  • Data that’s outdated
  • Data that’s geographically limited
  • Data from only one source

This example describes data that is insufficient because it’s geographically limited. If the analytics project has a global focus, the dataset should also be global.

Question 6

A car manufacturer wants to learn more about the brand preferences of electric car owners. There are millions of electric car owners in the world. Who should the company survey?

  • A sample of car owners who most recently bought an electric car
  • A sample of all electric car owners
  • A sample of car owners who have owned more than one electric car
  • The entire population of electric car owners

The company should survey a sample of all electric car owners.

Question 7

Fill in the blank: Sampling bias in data collection happens when a sample isn’t representative of _.

  • a dataset about the population
  • the population most affected by the data
  • a subset of the population
  • the population as a whole

Sampling bias in data collection happens when a sample isn’t representative of the population as a whole.

Question 8

Which of the following processes helps ensure a close alignment of data and business objectives?

  • Completing data replication
  • Transferring data multiple times
  • Having data update automatically during analysis
  • Maintaining data integrity

Maintaining data integrity helps ensure a close alignment of data and business objectives because the data is likely to be accurate, complete, consistent, and trustworthy.

Week 2: Sparkling-clean data

All about clean data

Every data analyst wants clean data to work with when performing an analysis. In this part of the course, you’ll learn the difference between clean and dirty data. You’ll also explore data cleaning techniques using spreadsheets and other tools.

Learning Objectives

  • Differentiate between clean and dirty data
  • Explain the characteristics of dirty data
  • Describe data cleaning techniques with reference to identifying errors, redundancy, compatibility and continuous monitoring
  • Identify common pitfalls when cleaning data
  • Demonstrate an understanding of the use of spreadsheets to clean data

Answers to week 2 quiz questions

L2 Recognize clean vs. dirty data

Question 1

Describe the difference between a null and a zero in a dataset.

  • A null signifies invalid data. A zero is missing data.
  • A null indicates that a value does not exist. A zero is a numerical response.
  • A null represents a value of zero. A zero represents an empty cell.
  • A null represents a number with no significance. A zero represents the number zero.

A null indicates that a value does not exist. A zero is a numerical response.

Question 2

What are the most common processes and procedures handled by data engineers? Select all that apply.

  • Giving data a reliable infrastructure
  • Developing, maintaining, and testing systems
  • Verifying results of data analysis
  • Transforming data into a useful format for analysis

Data engineers transform data into a useful format for analysis; give it a reliable infrastructure; and develop, maintain, and test systems.

Question 3

What are the most common processes and procedures handled by data warehousing specialists? Select all that apply.

  • Ensuring data is properly cleaned
  • Ensuring data is available
  • Ensuring data is backed up to prevent loss
  • Ensuring data is secure

Data warehousing specialists are responsible for ensuring data is available, secure, and backed up to prevent loss.

Question 4

A data analyst is cleaning a dataset. They want to confirm that exactly three characters are present in each cell of a certain spreadsheet column. Which tool can they use?

  • Character count
  • Field length
  • Range
  • Max

Field length enables an analyst to determine how many characters can be typed into a spreadsheet field. An analyst can use field length as part of the data-validation process.

L3 Data cleaning techniques

L4 Cleaning data in spreadsheets

Weekly challenge 2

Question 1

Which of the following terms describe dirty data? Select all that apply.

  • Irrelevant
  • Incomplete
  • Infallible
  • Incorrect

Dirty data is incomplete, incorrect, and irrelevant to the problem being solved.

Question 2

Field length is a spreadsheet tool for determining if a field has been duplicated.

  • True
  • False

Field length determines the number of characters that may be typed into a field.

Question 3

A data analyst notices that the customer in row 2 shares the same Customer ID as the customer in row 6. What does this scenario describe?

ABCDD
1 Last name First name Middle initial Customer ID
2 Smith Leonardo R. 64078
3 Lee Natasha E. 92862
4 Wallace Luciana M. 55107
5 Xiao Hua A. 88492
6 Smith Leo R. 64078
7 Chaudhuri Toby T. 34694
8 Lee Tasha P. 18295
9 Walton Mason Q. 58239
10 Richards Felix S. 12765
11 Guillermo Beth I. 27593
12 Walton Nadine J. 67292
12 Walton Nadine J. 67292
  • Duplicate data
  • Mislabeled data
  • Inconsistent data
  • Obsolete data

This is duplicate data because the customer data in row 2 is a duplicate of the customer data in row 6.

Question 4

Fill in the blank: Conditional formatting is a spreadsheet tool that changes how _ appear when values meet a specific condition.

  • filters
  • cells
  • queries
  • charts

Conditional formatting is a spreadsheet tool that changes how cells appear when values meet a specific condition.

Question 5

A data analyst uses the SPLIT function to divide a text string around a specified character and put each fragment into a new, separate cell. What is the specified character separating each item called?

  • Delimiter
  • Unit
  • Partition
  • Substring

When using the SPLIT function, the specified character separating each item is called a delimiter.

Question 6

For a function to work properly, data analysts must follow each function’s predetermined structure. What is this structure called?

  • Syntax
  • Validation
  • Summary
  • Algorithm

This structure is called syntax. Syntax is a predetermined structure that includes all required information and its proper placement.

Question 7

You are working with the following selection of a spreadsheet:

AB
1 Customer Address
2 Sally Stewart 9912 School St. North Wales, PA 19454
3 Lorenzo Price 8621 Glendale Dr. Burlington, MA 01803
4 Stella Moss 372 W. Addison Street Brandon, FL 33510
5 Paul Casey 9069 E. Brickyard Road Chattanooga, TN 37421

In order to extract the five-digit postal code from Burlington, MA, what is the correct function?

  • =LEFT(5,B3)
  • =RIGHT(B3,5)
  • =RIGHT(5,B3)
  • =LEFT(B3,5)

The correct syntax is =RIGHT(B3,5). The RIGHT function returns a set number of characters from the right side of a text string. B3 is the specified cell. And 5 is the number of characters to return.

Question 8

A data analyst in a human resources department is working with the following selection of a spreadsheet:

ABCD
1 Year Hired Last 4 of SS# Department Employee ID
2 2019 1192 Marketing
3 2014 2683 Operations
4 2020 1939 Strategy
5 2009 3208 Graphics

They want to create employee identification numbers (IDs) in column D. The IDs should include the year hired plus the last four digits of the employee’s Social Security Number (SS#). What function will create the ID 20093208 for the employee in row 5?

  • =CONCATENATE(A5,B5)
  • =CONCATENATE(A5+B5)
  • =CONCATENATE(A5:B5)
  • =CONCATENATE(A5*B5)

To create the ID 20093208 for the employee in row 5, the function is =CONCATENATE(A5,B5). CONCATENATE joins together two or more text strings. (A5,B5) are the locations of the strings to be joined.

Question 9

An analyst is cleaning a new dataset containing 500 rows. They want to make sure the data contained from cell B2 through cell B300 does not contain a number greater than 50. Which of the following COUNTIF function syntaxes could be used to answer this question? Select all that apply.

  • =COUNTIF(B2:B300,>50)
  • =COUNTIF(B2:B300,”<=50”)
  • =COUNTIF(B2:B300,<=50)
  • =COUNTIF(B2:B300,”>50″)

One possible syntax is =COUNTIF(B2:B300,”>50″). This returns the number of cells that are greater than 50. Another option is =COUNTIF(B2:B300,<=50). This returns the number of cells that are less than or equal to 50. Either one can confirm that the data does not contain a number greater than 50.

Question 10

The V in VLOOKUP stands for what?

  • Virtual
  • Vertical
  • Visual
  • Variable

The V in VLOOKUP stands for vertical. VLOOKUP is a spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information.

Question 11

Fill in the blank: Data mapping is the process of _ fields from one data source to another.

  • matching
  • linking
  • merging
  • extracting

Data mapping is the process of matching fields from one data source to another.

Question 12

Describe the relationship between a primary key and a foreign key.

  • A primary key references a row in which each value is unique. A foreign key is a column within a table that is a primary key in another table.
  • A primary key is a field within a table that is a foreign key in another table. A foreign key references a column in which each value is unique
  • A primary key references a column in a table in which each value is unique. A foreign key is a field within a table that is a primary key in another table.
  • A primary key references a field within a table that is a foreign key in another table. A foreign key references a row in which each value is unique.
    Correct

A primary key references a column in a table in which each value is unique. A foreign key is a field within a table that is a primary key in another table.

Week 3: Cleaning data with SQL

Cleaning data in SQL

Knowing a variety of ways to clean data can make an analyst’s job much easier. In this part of the course, you’ll check out how to clean your data using SQL. You’ll explore queries and functions that you can use in SQL to clean and transform your data to get it ready for analysis.

Learning Objectives

  • Describe how SQL can be used to clean large datasets
  • Compare spreadsheet data-cleaning functions to those associated with SQL in databases
  • Develop basic SQL queries for use with databases
  • Apply basic SQL functions for use in cleaning string variables in a database
  • Apply basic SQL functions for transforming data variables

Answers to week 3 quiz questions

L2 More about SQL

Question 1

Which of the following are benefits of using SQL? Select all that apply.

  • SQL can also be used to create web apps.
  • SQL offers powerful tools for cleaning data.
  • SQL can be adapted and used for multiple database programs.
  • SQL can handle huge amounts of data.

SQL can handle huge amounts of data, can be adapted and used for multiple database programs, and offers powerful tools for cleaning data.

Question 2

Which of the following tasks can data analysts do using both spreadsheets and SQL? Select all that apply.

  • Perform arithmetic
  • Process huge amounts of data efficiently
  • Use formulas
  • Join data

Analysts can use SQL and spreadsheets to perform arithmetic, use formulas, and join data.

Question 3

SQL is a language used to communicate with databases. Like most languages, SQL has dialects. How should data analysts approach SQL dialects? Select all that apply.

  • SQL dialects don’t change often, so data analysts should pick one and master it.
  • SQL dialects apply to different database programs, so data analysts should first master Standard SQL.
  • SQL dialects vary company by company, so data analysts should learn the dialect their company uses.
  • SQL has different dialects, and data analysts must learn all of them.

SQL dialects apply to different database programs, so data analysts should first master Standard SQL. In addition, they should learn the dialect their company uses.

L3 Learn basic SQL queries

Question 1

Which of the following SQL functions can data analysts use to clean string variables? Select all that apply.

  • SUBSTR
  • COUNTIF
  • LENGTH
  • TRIM

Data analysts can use the SUBSTR and TRIM functions to clean string variables.

Question 2

You are working with a database of information about middle school students. The student_data table contains the name and eight-digit identification (ID) number for each student. The first four digits of each ID number correspond to the student’s graduation year. For example, 20267482 indicates the student will graduate in 2026.

The identification number is stored as a string in the id_number column. How do you complete this query to return the name of all students who will graduate in 2026? Select name from student data where

This function instructs the database to return four characters of each student ID, starting with the first character. It will only retrieve data about students who will graduate in 2026.

Select name from student data where

SUBSTR (id number, 1, 4) = 2026

This function instructs the database to return four characters of each student ID, starting with the first character. It will only retrieve data about students who will graduate in 2026.

Question 3

A data analyst wants to confirm that all of the text strings in a table are the correct length. How would they complete the following query to return any routes greater than 10 characters long?
SELECT route FROM US_roads_data WHERE

  • LENGTH = (route) < 10
  • LENGTH(route) > 10
  • LENGTH = (route) > 10
  • LENGTH(route) < 10

The LENGTH statement is LENGTH(route) > 10. This function instructs the database to return any routes that are greater than 10 characters long.

Weekly challenge 3

Question 1

Data analysts choose SQL for which of the following reasons? Select all that apply.

  • SQL is a programming language that can also create web apps
  • SQL is a powerful software program
  • SQL is a well-known standard in the professional community
  • SQL can handle huge amounts of data

Data analysts choose SQL because it can handle huge amounts of data. SQL is also a well-known standard in the professional community.

Question 2

In which of the following situations would a data analyst use spreadsheets instead of SQL? Select all that apply.

  • When visually inspecting data
  • When working with a dataset with more than 1,000,000 rows
  • When working with a small dataset
  • When using a language to interact with multiple database programs

An analyst would choose to use spreadsheets instead of SQL when visually inspecting data or working with a small dataset.

Question 3

A data analyst creates many new tables in their company’s database. When the project is complete, the analyst wants to remove the tables so they don’t clutter the database. What SQL commands can they use to delete the tables?

  • INSERT INTO
  • CREATE TABLE IF NOT EXISTS
  • UPDATE
  • DROP TABLE IF EXISTS

The analyst can use the DROP TABLE IF EXISTS query to delete the tables so they don’t clutter the database.

Question 4

A data analyst is cleaning customer data for an online retail company. They are working with the following section of a database:

Fill in the blank: in sql databases, the _____ function can be used to convert data from one datatype to another.

The analyst wants to find out if the state data is consistent and if any text strings contain more than two characters. What is the correct SQL clause to use to find any text strings containing more than two characters?

The correct LENGTH statement is LENGTH (state) > 2.

Question 5

Fill in the blank: The _ function counts the number of characters a string contains.

  • SUBSTR
  • CAST
  • LENGTH
  • TRIM

The LENGTH function counts the number of characters the string contains.

Question 6

In SQL databases, what data type refers to a number that contains a decimal?

  • Integer
  • String
  • Boolean
  • Float

In SQL databases, the float data type refers to a number that contains a decimal.

Question 7

Fill in the blank: In SQL databases, the _ function can be used to convert data from one datatype to another.

  • TRIM
  • LENGTH
  • SUBSTR
  • CAST

The CAST function can be used to convert data from one datatype to another.

Question 8

Fill in the blank: The _ function can be used to return non-null values in a list.

  • CONCAT
  • TRIM
  • COALESCE
  • CAST

The COALESCE function can be used to return non-null values in a list.

Week 4: Verify and report on your cleaning results

Verify and report your cleaning results

Cleaning your data is an essential step in the data analysis process. Verifying and reporting your cleaning is a way to show that your data is ready for the next step. In this part of the course, you’ll find out the processes involved with verifying and reporting data cleaning as well as their benefits.

Learning Objectives

  • Describe the process involved in verifying the results of cleaning data
  • Describe what is involved in manually cleaning data
  • Discuss the elements and importance of data-cleaning reports
  • Describe the benefits of documenting data cleaning process

Answers to week 4 quiz questions

L2 Manually cleaning data

Question 1

Making sure data is properly verified is an important part of the data-cleaning process. Which of the following tasks are involved in this verification? Select all that apply.

  • Rechecking the data-cleaning effort
  • Providing a list of updates to stakeholders
  • Manually fixing any errors data analysts find
  • Comparing the original purpose of the project to the findings

The verification process confirms that data cleaning was well executed and the resulting data is accurate and reliable. To verify data, analysts recheck the data-cleaning effort, compare the original purpose to the findings, and manually fix errors.

Question 2

An analyst has just finished cleaning a dataset. Before analysis, why might the analyst want to revisit the business problem? Select all that apply.

  • To confirm that the data is capable of meeting project objectives
  • To consider whether the data can help solve the business problem
  • To select which data points to include in analysis
  • To schedule a meeting with stakeholders

The analyst might want to revisit the business problem in order to confirm that the data is capable of meeting the project objectives and solving the business problem.

Question 3

A data analyst is cleaning a dataset with inconsistent formats and repeated cases. They use the TRIM function to remove extra spaces from string variables. What other tools can they use for data cleaning? Select all that apply.

  • Import data
  • Remove duplicates
  • Pivot table
  • Protect sheet

The analyst can use TRIM, remove duplicates, and pivot tables for data cleaning.

L3 Documenting cleaning results

Question 1

Fill in the blank: While cleaning data, documentation is used to track _. Select all that apply.

  • <deletions
  • <changes
  • <errors
  • <bias

While cleaning data, documentation is used to track changes, deletions, and errors.

Question 2

Why is it important for a data analyst to document the evolution of a dataset? Select all that apply.

  • To inform other users of changes
  • To determine the quality of the data
  • To identify best practices in the collection of data
  • To recover data-cleaning errors

It is important to document the evolution of a dataset in order to recover data-cleaning errors, inform other users of changes, and determine the quality of the data.

L4 Documentation the cleaning process

Question 1

Which of the following data errors can be eliminated by documenting the data-cleaning process? Select all that apply.

  • Human error in data entry
  • System issues
  • Flawed processes
  • Premature feedback

Human error in data entry, flawed processes, and system issues can be eliminated by documenting the data-cleaning process.

Question 2

Documenting data-cleaning makes it possible to achieve what goals? Select all that apply.

  • Demonstrate to project stakeholders that you are accountable
  • Be transparent about your process
  • Visualize the results of your data analysis
  • Keep team members on the same page

Documenting data-cleaning makes it possible to be transparent about your process, keep team members on the same page, and demonstrate to project stakeholders that you are accountable.

Weekly challenge 4

Question 1

The data collected for an analysis project has just been cleaned. What are the next steps for a data analyst? Select all that apply.

  • Verification
  • Reporting
  • Certification
  • Validation

Verification and reporting are the next steps for a data analyst after the data is cleaned.

Question 2

A data analyst is in the verification step. They consider the business problem, the goal, and the data involved in their analytics project. What scenario does this describe?

  • Reporting on the data
  • Seeing the big picture
  • Considering the stakeholders
  • Visualizing the data

To see the big picture when verifying data cleaning, consider the business problem, the goal, and the data.

Question 3

Which function removes leading, trailing, and repeated spaces in data?

  • CUT
  • CROP
  • TRIM
  • TIDY

TRIM is a function that removes leading, trailing, and repeated spaces in data.

Question 4

A data analyst uses the COUNTA function to count which of the following?

  • The total number of headers in a specific range.
  • The total number of values within a specified range.
  • The total number of entries in a changelog.
  • The specific numbers in a dataset.

A data analyst uses the COUNTA function to count the total number of values within a specified range.

Question 5

A WHEN statement considers one or more conditions and returns a value as soon as that condition is met.

  • True
  • False

A CASE statement considers one or more conditions and returns a value as soon as that condition is met.

Question 6

What is the process of tracking changes, additions, deletions, and errors during data cleaning?

  • Recording
  • Documentation
  • Observation
  • Cataloging

Documentation is the process of tracking changes, additions, deletions, and errors during data cleaning.

Question 7

Fill in the blank: A changelog contains a _ list of modifications made to a project.

  • approximate
  • random
  • chronological
  • synchronized

A data analyst uses a changelog to access the information needed. A changelog is a file that contains a chronological list of modifications made to a project.

Question 8

Reviewing version history is an effective way to view a changelog in SQL.

  • True
  • False

Reviewing version history is an effective way to view a changelog in spreadsheets.

Week 5: Optional: Adding data to your resume

Optional: Adding data to your resume

Creating an effective resume will help you on your data analytics career path. In this part of the course, you’ll learn all about the job application process with a focus on crafting a resume that highlights your strengths and applicable experience. Even if you aren’t applying to jobs yet, it’s still a good time to improve your resume. It’s like spring training for a first season in a major league–you don’t want to miss it!

Learning Objectives

  • Identify key elements of a data analyst resume
  • Demonstrate an understanding how previous experience may be added to a resume
  • Discuss how a data analyst job description may be aligned to a particular area of interest

Career-building expertise on YouTube

How to build a compelling data science portfolio and resume: A hiring manager from Quora reviews actual resumes from data science candidates and gives candid feedback on areas of improvement. Learn what to include and omit from your resume and portfolio as well as formatting tips. This offers a great firsthand look into what hiring managers are seeking when reviewing your resume and portfolio

Portfolio and resume analysis with data science hiring managers: We put together a panel of hiring managers to discuss what they are seeking in candidates and how they examine different resumes submitted by job seekers like you. Learn from the mistakes of others and get ahead of the curve by adapting your resume/portfolio to avoid the noted mistakes and capitalize on what others have done well in their resumes

Overview of the Data Science Interview Process: Hiring managers at Google discuss typical data science interviews, including the soft and hard skills you will want to prioritize. You will get a better sense of the interview process from both sides, and better prepare yourself for what to expect when interviewing for a data science role.

Live Breakdown of Common Data Science Interview Questions: Watch a mock interview to see how a Kaggle data scientist answers questions during a data science interview. The video also includes live coding! This video is great preparation for some of the most commonly asked data science interview questions.

Am I a Good Fit? Identifying Your Best Data Science Job Opportunities: Ever wonder where you will fit in for your future career? This chat with Jessica Kirkpatrick, an intelligence manager, gives you a great breakdown of the different types of categories within the data science job market, the different types of job opportunities you may notice, and how you can frame previous work and skills from another career to fit into the data science job market.

Real Stories from a Panel of Successful Career Switchers: Are you switching careers? Awesome! Learn from people who were in the same position as you and successfully switched their careers into data science. This panel discusses the different experiences in their careers and life that shifted them into the data science field.

Week 6: Course challenge

Prepare for the course challenge by reviewing terms and definitions in the glossary. Then, demonstrate your knowledge of the importance of sample size, data integrity, and the connection of data to business objectives during the quiz. You will also have an opportunity to apply your skill with data cleaning techniques in both spreadsheets and SQL. Finally, document, report on, and verify your data-cleaning process and results.

Learning Objectives

  • Describe statistical measures associated with data integrity including statistical power, hypothesis testing, and margin of error
  • Describe strategies that can be used to address insufficient data
  • Discuss the importance of sample size with reference to sample bias and random samples
  • Describe the relationship between data and related business objectives
  • Define data integrity with reference to types and risks
  • Describe data cleaning techniques with reference to identifying errors, redundancy, compatibility and continuous monitoring
  • Demonstrate an understanding of the use of spreadsheets to clean data
  • Describe how SQL can be used to clean large datasets
  • Describe the benefits of documenting data cleaning process
  • Discuss the elements and importance of data-cleaning reports

Course challenge

Scenario 1, questions 1-5

Question 1

You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data.

Meer-Kitty Interior Design About Us Page.pdf

Meer-Kitty Interior Design Business Plan.pdf

Meer-Kitty Interior Design has two goals. They want to expand their online audience, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first.

Kitty Survey Feedback – Meer-Kitty survey feedback.csv

You are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times.

As the survey has too few responses and numerous duplicates that are skewing results, what are your options? Select all that apply.

  • Repeat the survey in order to create a new, improved dataset.
  • Locate another dataset about indoor paint.
  • Remove the duplicates from the data and proceed with analysis.
  • Talk with stakeholders and ask for more time.

With numerous duplicates, the best option is to talk with stakeholders and ask for more time. Then, you can repeat the survey in order to create a new, improved dataset.

Question 2

During the meeting, you also learn that Meer-Kitty videos are hosted on their website. For each product offered, there is an accompanying video for customers to learn more. So, more views for a video suggests greater consumer interest.

Your goal is to identify which videos are most popular, so Meer-Kitty knows what topics to explore in the future. Unfortunately, Meer-Kitty has just three months of data available because they only recently launched the videos on their site.

Without enough data to identify long-term trends about the video subjects that people prefer, what should you do?

  • Find an alternate data source that will still enable you to meet your objective.
  • Watch the videos and use your gut instinct to identify which are most successful.
  • Tell the client you’re sorry, but there is no way to meet their objective.
  • Move ahead with the data you have to determine the top video subjects.

Without enough data to identify long-term trends, one option is to find an alternate data source that will still enable you to meet your objective. In this case, you could find data from a similar company and learn about its consumer interest and trends.

Question 3

Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole.

Clearly, one particular respondent, the superfan, is overrepresented. This means the data doesn’t represent the population as a whole.

When surveying people for Meer-Kitty in the future, what are some best practices you can use to address some of the issues associated with sampling bias? Select all that apply.

  • Increase sample size
  • Use data that keeps updating
  • Use data from only one source
  • Use random sampling

To address some of the issues associated with sampling bias, random sampling helps select a sample from a population so that every possible type of the sample has an equal chance of being chosen. In addition, by increasing sample size, you’re more likely to survey part of a population that is representative of the whole.

Question 4

The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates.

Kitty Survey Feedback – New Meer-Kitty survey feedback.csv

You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls.

You decide to use a spreadsheet tool that changes how cells appear when they contain the word Yes. Which tool do you use?

  • Data validation
  • Conditional formatting
  • Filtering
  • CONCATENATE

To change how cells appear when they meet a certain value, use conditional formatting.

Question 5

You continue cleaning the data. You use tools such as remove duplicates and COUNTIF to ensure the dataset is complete, correct, and relevant to the problem you’re trying to solve. Then, you complete the verification and reporting processes to share the details of your data-cleaning effort with your team.

While reviewing, your team notes one aspect of data cleaning that would improve the dataset even more. They point out that the new survey also has a new question in Column G: “What are your favorite indoor paint colors?” This was a free-response question, so respondents typed in their answers. Some people included multiple different colors of paint. In order to determine which colors are most popular, it will be necessary to put each color in its own cell.

What spreadsheet function enables you to put each of the colors in Column G into a new, separate cell?

  • Delimit
  • MID
  • Divide
  • SPLIT

To put each of the colors in Column G into a new, separate cell, use SPLIT. SPLIT is a spreadsheet function that divides text around a specified character and puts each fragment into a new, separate cell.

Scenario 2, questions 6-10

Question 6

You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below:

C4 B.Spoke Market Research Job Description.pdf

So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below:

C4 S2 Email from Recruiter.pdf

You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins.

For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use a spreadsheet function to find the information we need.

There is a spreadsheet function that searches for a value in the first column of a given range and returns the value of a specified cell in the row in which it is found. It is called SEARCH.

  • True
  • False

The VLOOKUP function searches for a certain value in a column to return a corresponding piece of information.

Question 7

Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL. She explains that the data her team receives from customer surveys sometimes has many duplicate entries.

She says: Spreadsheets have a great tool for that called remove duplicates. In SQL, you can include DISTINCT to do the same thing. In which part of the SQL statement do you include DISTINCT?

  • The FROM statement
  • The WHERE statement
  • The UPDATE statement
  • The SELECT statement

To remove duplicates in SQL, include DISTINCT in your SELECT statement.

Question 8

Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format.

She asks: What function would you use to convert data in a SQL table from one datatype to another?

  • CONVERT
  • CHANGE
  • CAST
  • COALESCE

The CAST function is used to convert data in a SQL table from one datatype to another.

Question 9

Next, your interviewer explains that one of their clients is an online retailer that needs to create product numbers for a vast inventory. Her team does this by combining the text strings for product number, manufacturing date, and color.

She asks: Which SQL function would you use to add strings together to create new text strings?

  • COMBINE
  • CREATE
  • COALESCE
  • CONCAT

To add strings together to create new text strings, use the CONCAT function.

Question 10

For your final question, your interviewer explains that her team often comes across data with extra spaces.

She asks: Which function would enable you to eliminate those extra spaces? You respond: To eliminate extra spaces for consistency, use the TRIM function.

  • True
  • False

To eliminate extra spaces for consistency, use the TRIM function.

Basic Statistics Mini-Course

Google Data Analytics Professional Certificate Course 1: Foundations – Cliffs Notes

Google Data Analytics Professional Certificate Course 2: Ask Questions – quiz answers

Google Data Analytics Professional Certificate Course 3: Prepare Data – quiz answers

Google Data Analytics Professional Certificate Course 5: Analyze Data – quiz answers

Google Data Analytics Professional Certificate Course 6: Share Data – quiz answers

Google Data Analytics Professional Certificate Course 7: Data Analysis with R – quiz answers

Google Data Analytics Professional Certificate Course 8: Capstone – quiz answers

IT career paths – everything you need to know

Back to DTI Courses

What function can be used to convert data from one datatype to another?

Additionally there area number of OLAP DML functions that you can use to convert values from one data type to another.

What function would you use to convert data in a SQL table?

The CONVERT() function in SQL server is used to convert a value of one type to another type.

What are the SQL data types?

Data types in SQL Server are organized into the following categories:.
Exact numerics. Unicode character strings..
Approximate numerics. Binary strings..
Date and time. Other data types..
Character strings..
bigint. numeric..
bit. smallint..
decimal. smallmoney..
int. tinyint..

What is SQL double?

What is a double data type in SQL? DOUBLE(size, d) A normal-size floating point number. The total number of digits is specified in size. The number of digits after the decimal point is specified in the d parameter.