• 0: Course Prep
  • Week 1.1: Course Introduction
  • Week 1.2: Getting Started
  • Week 1.3: Operations & Data Types
  • Week 2: Functions & Packages
  • Week 3: Creating functions
  • Week 4.1: Conditionals
  • Week 4.2: Testing & Debugging
  • Week 5: Loops
  • Week 6: Vectors
  • Week 7: Strings
  • Week 9: Introduction to Python
  • Week 11-1: Data Analysis Prelude
  • Week 11-2: Data Frames
  • Week 12: Data Wrangling
  • Week 13: Data Visualization
  • Week 14: Reproducible Reporting
  • Week 15: Monte Carlo Methods
  • HW 1 - Getting Started
  • HW 2 - Functions & Packages
  • HW 3 - Creating Functions
  • HW 4 - Conditionals & Testing
  • HW 5 - Loops
  • HW 6 - Vectors
  • HW 7 - Strings
  • HW 8 - Python
  • HW 9 - Data Frames
  • HW 10 - Data Wrangling
  • HW 11 - Data Visualization
  • HW 12 - Reproducible Reporting
  • Getting Help
  • Programming in R
  • Visualizing Data
  • RStudio Server
  • Schedule a meeting w/Prof. Helveston
  • Source files

Homework 11 - Data Visualization

Due : 30 November by 11:00 pm Weight : This assignment is worth 4% of your final grade. Purpose, Skills, & Knowledge : The purposes of this assignment are: To practice exploring and data frames in R using the dplyr library To practice generating plots using the ggplot2 library Assessment : Each question indicates the % of the assignment grade, summing to 100%. The credit for each question will be assigned as follows: 0% for not attempting a response. 50% for attempting the question but with major errors. 75% for attempting the question but with minor errors. 100% for correctly answering the question. Rules : This entire assignment is SOLO . You may not work with other classmates, though you may consult instructors for help.

1) Staying organized [5%]

Download and use this template for your assignment. Inside the “hw11” folder, open and edit the R script called “hw11.R” and fill out your name, GW Net ID, and the names of anyone you worked with on this assignment.

2) Choose and load some data [5%]

For this assignment, you will need to find a dataset of your choosing and create three summary visualizations. To keep things manageable, choose one of the following datasets from the following libraries. Note that to load any of these data frames, all you need to do is install and load the library.

3) Inspect your data [10%]

Once you’ve chosen a data set, open your hw11.R file and begin exploring the data (be sure to load the library that contains the dataset at the top of your file). Write some code in code chunks to preview and summarize the data frame using some of the methods we’ve used in class. You should be able to quickly get an understanding of what variables are included and their nature. Consider the following questions in your exploration (you don’t have to write out answers to these questions - just write code to help you answer them by previewing the data in different ways):

  • What is the total size of the data frame?
  • What type of data is each variable (numeric, character, logical, date)?
  • Do any variables have missing values? Why might that be?
  • For numeric variables, what are the min and max values?
  • For character variables, what are the unique values in the variable?
  • For date variables, what time period do the observations in these data frames span?

Do not brush this step off - the more thoroughly you inspect your dataset, the easier (and better) you data exploration will be. This will be absolutely critical for making your plots. Make sure you take the time to develop an understanding of the variables in your dataset as it is nearly impossible to imagine what different plots might be worth creating otherwise.

4) Make plots [50%]

Now that you have a basic understanding of the dataset, make some plots to explore the variables in the data and their potential relationships. You may use base R plotting functions or the ggplot2 library to make your figures, but you must make at least two different types of figures, including:

  • A scatterplot of involving at least two variables.
  • A bar chart involving at least one variable.

You can choose to plot whichever variables you wish, but you must be able to interpret the results of your plot.

5) Interpret your plots [15%]

Below the plot code for each of your plots, write a description and interpretation of your plot in a comment. Make sure you address at least the following questions:

  • Describe what variables you are plotting and why.
  • Describe the primary relationship / trend / information you hope the reader will gain from your visualization.

6) Save your plots [10%]

At the bottom of your hw11.R file, write code to save each of your three plots in the plots folder. Save them as .png files.

7) Submit your files on Blackboard [5%]

Create a zip file of all files in your R project folder for this assignment and submit the zip file on Blackboard by the due deadline.

6.894 : Interactive Data Visualization

Assignment 1: visualization design.

In this assignment, you will design a visualization for a small data set and provide a rigorous rationale for your design choices. You should in theory be ready to explain the contribution of every pixel in the display. You are free to use any graphics or charting tool you please – including drafting it by hand. However, you may find it most instructive to create the chart from scratch using a graphics API of your choice.

(See Resources for a list of visualization tools.)

Data Set: U.S. Population, 1900 vs. 2000

Every 10 years, the census bureau documents the demographic make-up of the United States, influencing everything from congressional districting to social services. This dataset contains a high-level summary of census data for two years a century apart: 1900 and 2000. The data is a CSV (comma-separated values) file that describes the U.S. population in terms of year, reported sex (1: male, 2: female), age group (binned into 5 year segments from 0-4 years old up to 90+ years old), and the total count of people per group. There are 38 data points per year, for a total of 76 data points.

Dataset: CSV Source: U.S. Census Bureau via IPUMS

  • Start by choosing a question you'd like a visualization to answer.
  • Design a static visualization (i.e., a single image) that you believe effectively answers that question, and use the question as the title of your graphic.
  • Provide a short write-up (no more than 4 paragraphs) describing your design.

While you must use the data set given, you are free to transform the data as you see fit. Such transforms may include (but are not limited to) log transformation, computing percentages or averages, grouping elements into new categories, or removing unnecessary variables or records. You are also free to incorporate external data as you see fit. Your chart image should be interpretable without recourse to your short write-up. Do not forget to include title, axis labels or legends as needed!

As different visualizations can emphasize different aspects of a data set, you should document what aspects of the data you are attempting to most effectively communicate. In short, what story are you trying to tell? Just as important, also note which aspects of the data might be obscured or down-played due to your visualization design.

In your write-up, you should provide a rigorous rationale for your design decisions. Document the visual encodings you used and why they are appropriate for the data and your specific question. These decisions include the choice of visualization type, size, color, scale, and other visual elements, as well as the use of sorting or other data transformations. How do these decisions facilitate effective communication?

The assignment score is out of a maximum of 10 points. Historically, the median score on this assignment has been 8.5. We will determine scores by judging both the soundness of your design and the quality of the write-up. We will also look for consideration of audience, message and intended task.

We will use the following rubric to grade your assignment. Note, rubric cells may not map exactly to specific point scores.

We will reward entries that go above and beyond the assignment requirements to produce effective graphics. Examples may include outstanding visual design, meaningful incorporation of external data to reveal important trends, demonstrating exceptional creativity, or effective annotations or other narrative devices.

Submission Details

This is an individual assignment. You may not work in groups. Your completed assignment is due on Wednesday 2/12, by noon .

Submit your assignment using this form . The form expects your visualization to be an image (either a .png or .jpg). Please make sure your image is sized for a reasonable viewing experience -- readers should not have to zoom or scroll in order to effectively view your submission!

Resubmissions. Resubmissions will be regraded by teaching staff, and you may earn back up to 50% of the points lost in the original submission. To resubmit this assignment, please use this form and follow the same submission process described above. Include a short 1 paragraph description summarizing the changes from the initial submission. Resubmissions without this summary will not be regraded. Resubmissions will be due by 11:59pm on Saturday, 2/29. Slack days may not be applied to extend the resubmission deadline. The teaching staff will only begin to regrade assignments once the Final Project phase begins, so please be patient.

  • Due: 12pm, Wed 2/12
  • The Dataset
  • Submission Form

IS445 - Data Viz - ACG/ACU

This is the course website for Data Visualization, instructed by Jill Naiman ([email protected]).

Below, you will find the materials for each week, as well as the syllabus that includes contact information and a course outline.

Lectures and Materials

Martin luther king day holiday, no classes, enjoy, introduction.

Lecture 1 - Class Introduction & Why we Visualize

The syllabus for the course, along with discussions about "what" visualizations are, and how to orient yourself in the course. What are some of the basics of how we interpret visualizations? How can we describe the process of making choices, understanding our audience, and so forth?

Example HW 1 Submission

Installation Instructions

Prep Notebook, Week 2

Import notebook for HW 1

In class week 2 notebook

GDP dataset

Stitch Image

Data Storage and Operations

overview concepts brain

Lecture 2 - Data storage & Operations, Image data

When we draw something on a screen, how do we represent that internally, and how is that translated into pixels? How are values transformed from 0's and 1's into values we can manipulate and understand?

Assignment help - TurnItIn

How to submit homework with the TurnItIn framework

Extras, Lecture 2

More examples of drawing images in 2D; binary representations

Prep Notebook, Week 3

In class, Week 3

Buildings dataset

Corgi in Hat

Types of Viz and Choosing Colors

Lecture 3 - Colors and Color maps, Types of viz

How do colors work? What are the different ways we can map colors to values? What should we keep in mind when doing this?

Prep Notebook, Week 4

In class notebook, Week 4

Palette Colors (palette_colors.py)

Michigan Depth Map (86Mb)

Brain Scan (72Mb)

Beginning interactivity

Lecture 4 - Widgets & Traitlets for Interactivity

We talk about the basics of using Traitlets and data binding in visualization.

In Class notebook

Prep Notebook, Week 5

Extra prep notebook

The UFO Sitings Dataset (13Mb)

Continuing interactivity with bqplot

Lecture 5 - Grammar of Graphics & bqplot

We introduce the basics of bqplot & how it relates to grammar of graphics

In class Notebook, Week 6 --> In class Notebook, Week 6 --> In class Notebook, Week 6

Prep Notebook, Week 6

Wealth of Nations Library (wealth_of_nations.py)

Wealth of Nations Data - nations.json

More with dashboards & Map Viz

Lecture 6 - Dashboards & Maps

Linking data a bit about maps (if we have time)

In Class Notebook, Week 7

Prep Notebook, Week 7, Part 1

Prep Notebook, Week 7, Part 2

State export utilities (states_utils.py)

US State abbreviations (us_state_abbrev.py)

Surgery Charges Dataset (37Mb)

Earthquake sensor data (59Mb)

Earthquake locations data (12Kb)

Maps, maps and more maps

Lecture 7 - Maps!

More about maps and their projections

In class notebook

Prep Notebook, Week 8, Part 1

Prep Notebook, Week 8, Part 2

State export data (8Kb)

Spring Break, no classes, enjoy!

Starboard and intro to javascript.

Lecture 8 - Choosing viz & Online viz platforms

Choosing what viz type to use, and an introduction to Vega/Vega-lite and Starboard

In Class Notebook, Week 10

Prep Starboard Notebook, Week 10

The GDP dataset (online)

The Mobility dataset (online)

Viz Audience; More Starboard, Javascript & Vega-lite (and maybe Idyll)

Lecture 10 - Viz audience, Final Project Info

Considerations of audience, review about final projects, more with vega lite & Starboard, Idyll

Idyll Installation Instructions

In Class Notebook, Week 11

Prep Starboard Notebook, Week 11

Finish up with Starboard, Intro to Idyll

Lecture 11 - Starboard, Publishing & more in Idyll

Publishing your viz, vega-lite in Idyll and a bit of d3.js

Lecture 11 extras - more with Vega-lite

More with vega-lite

In Class Notebook, Week 12

Prep Starboard Notebook, Week 12

Prep index.idyll file, Week 12

In class Idyll resources, Week 12

Corgis per country over time

Subset of full Corgi database

More with Idyll, Publishing Viz

Lecture 12 - Publishing & Validation

More about publishing and validation, and more of d3.js in Idyll

In class Idyll materials, Week 13

Prep Idyll materials, Week 13

Starting d3.js histogram example

More vega-lite idyll examples

A few more Idyll+d3 things, Starting SciViz

Lecture 13 - 3D graphics, Intro to SciViz

How your computer and the internet process 3D graphics. What is scientific visualization?

In class Idyll resources, Week 14

Prep resources, Week 14

In class jupyter notebook

Prep Notebook, Week 14, Part 1

Prep Notebook, Week 14, Part 2

Solver library (solverlibs.py)

Galaxy Particle Simulation files (77Mb)

Isolated Galaxy dataset (292 Mb)

Jeyll Intro Slides

Scientific Viz & Guest lecture from the Advanced Visualization Lab

Lecture 14 - Scientific Visualization, notes on final project

More about scientific visualization

Prep Notebook, Week 15

Network Visualization & Word clouds

Lecture 15 - WordClouds, networks, and final project

How to analyze text data, viz of networks, and where to go from here.

Prep Notebook, Week 16, Part 1

Prep Notebook, Week 16, Part 2

Text corpus from Othello

Broad facebook data

Major node data

CSE 163, Summer 2020: Homework 3: Data Analysis

In this assignment, you will apply what you've learned so far in a more extensive "real-world" dataset using more powerful features of the Pandas library. As in HW2, this dataset is provided in CSV format. We have cleaned up the data some, but you will need to handle more edge cases common to real-world datasets, including null cells to represent unknown information.

Note that there is no graded testing portion of this assignment. We still recommend writing tests to verify the correctness of the methods that you write in Part 0, but it will be difficult to write tests for Part 1 and 2. We've provided tips in those sections to help you gain confidence about the correctness of your solutions without writing formal test functions!

This assignment is supposed to introduce you to various parts of the data science process involving being able to answer questions about your data, how to visualize your data, and how to use your data to make predictions for new data. To help prepare for your final project, this assignment has been designed to be wide in scope so you can get practice with many different aspects of data analysis. While this assignment might look large because there are many parts, each individual part is relatively small.

Learning Objectives

After this homework, students will be able to:

  • Work with basic Python data structures.
  • Handle edge cases appropriately, including addressing missing values/data.
  • Practice user-friendly error-handling.
  • Read plotting library documentation and use example plotting code to figure out how to create more complex Seaborn plots.
  • Train a machine learning model and use it to make a prediction about the future using the scikit-learn library.

Expectations

Here are some baseline expectations we expect you to meet:

Follow the course collaboration policies

If you are developing on Ed, all the files are there. The files included are:

  • hw3-nces-ed-attainment.csv : A CSV file that contains data from the National Center for Education Statistics. This is described in more detail below.
  • hw3.py : The file for you to put solutions to Part 0, Part 1, and Part 2. You are required to add a main method that parses the provided dataset and calls all of the functions you are to write for this homework.
  • hw3-written.txt : The file for you to put your answers to the questions in Part 3.
  • cse163_utils.py : Provides utility functions for this assignment. You probably don't need to use anything inside this file except importing it if you have a Mac (see comment in hw3.py )

If you are developing locally, you should navigate to Ed and in the assignment view open the file explorer (on the left). Once there, you can right-click to select the option to "Download All" to download a zip and open it as the project in Visual Studio Code.

The dataset you will be processing comes from the National Center for Education Statistics. You can find the original dataset here . We have cleaned it a bit to make it easier to process in the context of this assignment. You must use our provided CSV file in this assignment.

The original dataset is titled: Percentage of persons 25 to 29 years old with selected levels of educational attainment, by race/ethnicity and sex: Selected years, 1920 through 2018 . The cleaned version you will be working with has columns for Year, Sex, Educational Attainment, and race/ethnicity categories considered in the dataset. Note that not all columns will have data starting at 1920.

Our provided hw3-nces-ed-attainment.csv looks like: (⋮ represents omitted rows):

Column Descriptions

  • Year: The year this row represents. Note there may be more than one row for the same year to show the percent breakdowns by sex.
  • Sex: The sex of the students this row pertains to, one of "F" for female, "M" for male, or "A" for all students.
  • Min degree: The degree this row pertains to. One of "high school", "associate's", "bachelor's", or "master's".
  • Total: The total percent of students of the specified gender to reach at least the minimum level of educational attainment in this year.
  • White / Black / Hispanic / Asian / Pacific Islander / American Indian or Alaska Native / Two or more races: The percent of students of this race and the specified gender to reach at least the minimum level of educational attainment in this year.

Interactive Development

When using data science libraries like pandas , seaborn , or scikit-learn it's extremely helpful to actually interact with the tools your using so you can have a better idea about the shape of your data. The preferred practice by people in industry is to use a Jupyter Notebook, like we have been in lecture, to play around with the dataset to help figure out how to answer the questions you want to answer. This is incredibly helpful when you're first learning a tool as you can actually experiment and get real-time feedback if the code you wrote does what you want.

We recommend that you try figuring out how to solve these problems in a Jupyter Notebook so you can actually interact with the data. We have made a Playground Jupyter Notebook for you that has the data uploaded. At the top-right of this page in Ed is a "Fork" button (looks like a fork in the road). This will make your own copy of this Notebook so you can run the code and experiment with anything there! When you open the Workspace, you should see a list of notebooks and CSV files. You can always access this launch page by clicking the Jupyter logo.

Part 0: Statistical Functions with Pandas

In this part of the homework, you will write code to perform various analytical operations on data parsed from a file.

Part 0 Expectations

  • All functions for this part of the assignment should be written in hw3.py .
  • For this part of the assignment, you may import and use the math and pandas modules, but you may not use any other imports to solve these problems.
  • For all of the problems below, you should not use ANY loops or list/dictionary comprehensions. The goal of this part of the assignment is to use pandas as a tool to help answer questions about your dataset.

Problem 0: Parse data

In your main method, parse the data from the CSV file using pandas. Note that the file uses '---' as the entry to represent missing data. You do NOT need to anything fancy like set a datetime index.

The function to read a CSV file in pandas takes a parameter called na_values that takes a str to specify which values are NaN values in the file. It will replace all occurrences of those characters with NaN. You should specify this parameter to make sure the data parses correctly.

Problem 1: compare_bachelors_1980

What were the percentages for women vs. men having earned a Bachelor's Degree in 1980? Call this method compare_bachelors_1980 and return the result as a DataFrame with a row for men and a row for women with the columns "Sex" and "Total".

The index of the DataFrame is shown as the left-most column above.

Problem 2: top_2_2000s

What were the two most commonly awarded levels of educational attainment awarded between 2000-2010 (inclusive)? Use the mean percent over the years to compare the education levels in order to find the two largest. For this computation, you should use the rows for the 'A' sex. Call this method top_2_2000s and return a Series with the top two values (the index should be the degree names and the values should be the percent).

For example, assuming we have parsed hw3-nces-ed-attainment.csv and stored it in a variable called data , then top_2_2000s(data) will return the following Series (shows the index on the left, then the value on the right)

Hint: The Series class also has a method nlargest that behaves similarly to the one for the DataFrame , but does not take a column parameter (as Series objects don't have columns).

Our assert_equals only checks that floating point numbers are within 0.001 of each other, so your floats do not have to match exactly.

Optional: Why 0.001?

Whenever you work with floating point numbers, it is very likely you will run into imprecision of floating point arithmetic . You have probably run into this with your every day calculator! If you take 1, divide by 3, and then multiply by 3 again you could get something like 0.99999999 instead of 1 like you would expect.

This is due to the fact that there is only a finite number of bits to represent floats so we will at some point lose some precision. Below, we show some example Python expressions that give imprecise results.

Because of this, you can never safely check if one float is == to another. Instead, we only check that the numbers match within some small delta that is permissible by the application. We kind of arbitrarily chose 0.001, and if you need really high accuracy you would want to only allow for smaller deviations, but equality is never guaranteed.

Problem 3: percent_change_bachelors_2000s

What is the difference between total percent of bachelor's degrees received in 2000 as compared to 2010? Take a sex parameter so the client can specify 'M', 'F', or 'A' for evaluating. If a call does not specify the sex to evaluate, you should evaluate the percent change for all students (sex = ‘A’). Call this method percent_change_bachelors_2000s and return the difference (the percent in 2010 minus the percent in 2000) as a float.

For example, assuming we have parsed hw3-nces-ed-attainment.csv and stored it in a variable called data , then the call percent_change_bachelors_2000s(data) will return 2.599999999999998 . Our assert_equals only checks that floating point numbers are within 0.001 of each other, so your floats do not have to match exactly.

Hint: For this problem you will need to use the squeeze() function on a Series to get a single value from a Series of length 1.

Part 1: Plotting with Seaborn

Next, you will write functions to generate data visualizations using the Seaborn library. For each of the functions save the generated graph with the specified name. These methods should only take the pandas DataFrame as a parameter. For each problem, only drop rows that have missing data in the columns that are necessary for plotting that problem ( do not drop any additional rows ).

Part 1 Expectations

  • When submitting on Ed, you DO NOT need to specify the absolute path (e.g. /home/FILE_NAME ) for the output file name. If you specify absolute paths for this assignment your code will not pass the tests!
  • You will want to pass the parameter value bbox_inches='tight' to the call to savefig to make sure edges of the image look correct!
  • For this part of the assignment, you may import the math , pandas , seaborn , and matplotlib modules, but you may not use any other imports to solve these problems.
  • For all of the problems below, you should not use ANY loops or list/dictionary comprehensions.
  • Do not use any of the other seaborn plotting functions for this assignment besides the ones we showed in the reference box below. For example, even though the documentation for relplot links to another method called scatterplot , you should not call scatterplot . Instead use relplot(..., kind='scatter') like we showed in class. This is not an issue of stylistic preference, but these functions behave slightly differently. If you use these other functions, your output might look different than the expected picture. You don't yet have the tools necessary to use scatterplot correctly! We will see these extra tools later in the quarter.

Part 1 Development Strategy

  • Print your filtered DataFrame before creating the graph to ensure you’re selecting the correct data.
  • Call the DataFrame describe() method to see some statistical information about the data you've selected. This can sometimes help you determine what to expect in your generated graph.
  • Re-read the problem statement to make sure your generated graph is answering the correct question.
  • Compare the data on your graph to the values in hw3-nces-ed-attainment.csv. For example, for problem 0 you could check that the generated line goes through the point (2005, 28.8) because of this row in the dataset: 2005,A,bachelor's,28.8,34.5,17.6,11.2,62.1,17.0,16.4,28.0

Seaborn Reference

Of all the libraries we will learn this quarter, Seaborn is by far the best documented. We want to give you experience reading real world documentation to learn how to use a library so we will not be providing a specialized cheat-sheet for this assignment. What we will do to make sure you don't have to look through pages and pages of documentation is link you to some key pages you might find helpful for this assignment; you do not have to use every page we link, so part of the challenge here is figuring out which of these pages you need. As a data scientist, a huge part of solving a problem is learning how to skim lots of documentation for a tool that you might be able to leverage to solve your problem.

We recommend to read the documentation in the following order:

  • Start by skimming the examples to see the possible things the function can do. Don't spend too much time trying to figure out what the code is doing yet, but you can quickly look at it to see how much work is involved.
  • Then read the top paragraph(s) that give a general overview of what the function does.
  • Now that you have a better idea of what the function is doing, go look back at the examples and look at the code much more carefully. When you see an example like the one you want to generate, look carefully at the parameters it passes and go check the parameter list near the top for documentation on those parameters.
  • It sometimes (but not always), helps to skim the other parameters in the list just so you have an idea what this function is capable of doing

As a reminder, you will want to refer to the lecture/section material to see the additional matplotlib calls you might need in order to display/save the plots. You'll also need to call the set function on seaborn to get everything set up initially.

Here are the seaborn functions you might need for this assignment:

  • Bar/Violin Plot ( catplot )
  • Plot a Discrete Distribution ( distplot ) or Continuous Distribution ( kdeplot )
  • Scatter/Line Plot ( relplot )
  • Linear Regression Plot ( regplot )
  • Compare Two Variables ( jointplot )
  • Heatmap ( heatmap )
Make sure you read the bullet point at the top of the page warning you to only use these functions!

Problem 0: Line Chart

Plot the total percentages of all people of bachelor's degree as minimal completion with a line chart over years. To select all people, you should filter to rows where sex is 'A'. Label the x-axis "Year", the y-axis "Percentage", and title the plot "Percentage Earning Bachelor's over Time". Name your method line_plot_bachelors and save your generated graph as line_plot_bachelors.png .

result of line_plot_bachelors

Problem 1: Bar Chart

Plot the total percentages of women, men, and total people with a minimum education of high school degrees in the year 2009. Label the x-axis "Sex", the y-axis "Percentage", and title the plot "Percentage Completed High School by Sex". Name your method bar_chart_high_school and save your generated graph as bar_chart_high_school.png .

Do you think this bar chart is an effective data visualization? Include your reasoning in hw3-written.txt as described in Part 3.

result of bar_chart_high_school

Problem 2: Custom Plot

Plot the results of how the percent of Hispanic individuals with degrees has changed between 1990 and 2010 (inclusive) for high school and bachelor's degrees with a chart of your choice. Make sure you label your axes with descriptive names and give a title to the graph. Name your method plot_hispanic_min_degree and save your visualization as plot_hispanic_min_degree.png .

Include a justification of your choice of data visualization in hw3-written.txt , as described in Part 3.

Part 2: Machine Learning using scikit-learn

Now you will be making a simple machine learning model for the provided education data using scikit-learn . Complete this in a function called fit_and_predict_degrees that takes the data as a parameter and returns the test mean squared error as a float. This may sound like a lot, so we've broken it down into steps for you:

  • Filter the DataFrame to only include the columns for year, degree type, sex, and total.
  • Do the following pre-processing: Drop rows that have missing data for just the columns we are using; do not drop any additional rows . Convert string values to their one-hot encoding. Split the columns as needed into input features and labels.
  • Randomly split the dataset into 80% for training and 20% for testing.
  • Train a decision tree regressor model to take in year, degree type, and sex to predict the percent of individuals of the specified sex to achieve that degree type in the specified year.
  • Use your model to predict on the test set. Calculate the accuracy of your predictions using the mean squared error of the test dataset.

You do not need to anything fancy like find the optimal settings for parameters to maximize performance. We just want you to start simple and train a model from scratch! The reference below has all the methods you will need for this section!

scikit-learn Reference

You can find our reference sheet for machine learning with scikit-learn ScikitLearnReference . This reference sheet has information about general scikit-learn calls that are helpful, as well as how to train the tree models we talked about in class. At the top-right of this page in Ed is a "Fork" button (looks like a fork in the road). This will make your own copy of this Notebook so you can run the code and experiment with anything there! When you open the Workspace, you should see a list of notebooks and CSV files. You can always access this launch page by clikcing the Jupyter logo.

Part 2 Development Strategy

Like in Part 1, it can be difficult to write tests for this section. Machine Learning is all about uncertainty, and it's often difficult to write tests to know what is right. This requires diligence and making sure you are very careful with the method calls you make. To help you with this, we've provided some alternative ways to gain confidence in your result:

  • Print your test y values and your predictions to compare them manually. They won't be exactly the same, but you should notice that they have some correlation. For example, I might be concerned if my test y values were [2, 755, …] and my predicted values were [1022, 5...] because they seem to not correlate at all.
  • Calculate your mean squared error on your training data as well as your test data. The error should be lower on your training data than on your testing data.

Optional: ML for Time Series

Since this is technically time series data, we should point out that our method for assessing the model's accuracy is slightly wrong (but we will keep it simple for our HW). When working with time series, it is common to use the last rows for your test set rather than random sampling (assuming your data is sorted chronologically). The reason is when working with time series data in machine learning, it's common that our goal is to make a model to help predict the future. By randomly sampling a test set, we are assessing the model on its ability to predict in the past! This is because it might have trained on rows that came after some rows in the test set chronologically. However, this is not a task we particularly care that the model does well at. Instead, by using the last section of the dataset (the most recent in terms of time), we are now assessing its ability to predict into the future from the perspective of its training set.

Even though it's not the best approach to randomly sample here, we ask you to do it anyways. This is because random sampling is the most common method for all other data types.

Part 3: Written Responses

Review the source of the dataset here . For the following reflection questions consider the accuracy of data collected, and how it's used as a public dataset (e.g. presentation of data, publishing in media, etc.). All of your answers should be complete sentences and show thoughtful responses. "No" or "I don't know" or any response like that are not valid responses for any questions. There is not one particularly right answer to these questions, instead, we are looking to see you use your critical thinking and justify your answers!

  • Do you think the bar chart from part 1b is an effective data visualization? Explain in 1-2 sentences why or why not.
  • Why did you choose the type of plot that you did in part 1c? Explain in a few sentences why you chose this type of plot.
  • Datasets can be biased. Bias in data means it might be skewed away from or portray a wrong picture of reality. The data might contain inaccuracies or the methods used to collect the data may have been flawed. Describe a possible bias present in this dataset and why it might have occurred. Your answer should be about 2 or 3 sentences long.

Context : Later in the quarter we will talk about ethics and data science. This question is supposed to be a warm-up to get you thinking about our responsibilities having this power to process data. We are not trying to train to misuse your powers for evil here! Most misuses of data analysis that result in ethical concerns happen unintentionally. As preparation to understand these unintentional consequences, we thought it would be a good exercise to think about a theoretical world where you would willingly try to misuse data.

Congrats! You just got an internship at Evil Corp! Your first task is to come up with an application or analysis that uses this dataset to do something unethical or nefarious. Describe a way that this dataset could be misused in some application or an analysis (potentially using the bias you identified for the last question). Regardless of what nefarious act you choose, evil still has rules: You need to justify why using the data in this is a misuse and why a regular person who is not evil (like you in the real world outside of this problem) would think using the data in this way would be wrong. There are no right answers here of what defines something as unethical, this is why you need to justify your answer! Your response should be 2 to 4 sentences long.

Turn your answers to these question in by writing them in hw3-written.txt and submitting them on Ed

Your submission will be evaluated on the following dimensions:

  • Your solution correctly implements the described behaviors. You will have access to some tests when you turn in your assignment, but we will withhold other tests to test your solution when grading. All behavior we test is completely described by the problem specification or shown in an example.
  • No method should modify its input parameters.
  • Your main method in hw3.py must call every one of the methods you implemented in this assignment. There are no requirements on the format of the output, besides that it should save the files for Part 1 with the proper names specified in Part 1.
  • We can run your hw3.py without it crashing or causing any errors or warnings.
  • When we run your code, it should produce no errors or warnings.
  • All files submitted pass flake8
  • All program files should be written with good programming style. This means your code should satisfy the requirements within the CSE 163 Code Quality Guide .
  • Any expectations on this page or the sub-pages for the assignment are met as well as all requirements for each of the problems are met.

Make sure you carefully read the bullets above as they may or may not change from assignment to assignment!

A note on allowed material

A lot of students have been asking questions like "Can I use this method or can I use this language feature in this class?". The general answer to this question is it depends on what you want to use, what the problem is asking you to do and if there are any restrictions that problem places on your solution.

There is no automatic deduction for using some advanced feature or using material that we have not covered in class yet, but if it violates the restrictions of the assignment, it is possible you will lose points. It's not possible for us to list out every possible thing you can't use on the assignment, but we can say for sure that you are safe to use anything we have covered in class so far as long as it meets what the specification asks and you are appropriately using it as we showed in class.

For example, some things that are probably okay to use even though we didn't cover them:

  • Using the update method on the set class even though I didn't show it in lecture. It was clear we talked about sets and that you are allowed to use them on future assignments and if you found a method on them that does what you need, it's probably fine as long as it isn't violating some explicit restriction on that assignment.
  • Using something like a ternary operator in Python. This doesn't make a problem any easier, it's just syntax.

For example, some things that are probably not okay to use:

  • Importing some random library that can solve the problem we ask you to solve in one line.
  • If the problem says "don't use a loop" to solve it, it would not be appropriate to use some advanced programming concept like recursion to "get around" that restriction.

These are not allowed because they might make the problem trivially easy or violate what the learning objective of the problem is.

You should think about what the spec is asking you to do and as long as you are meeting those requirements, we will award credit. If you are concerned that an advanced feature you want to use falls in that second category above and might cost you points, then you should just not use it! These problems are designed to be solvable with the material we have learned so far so it's entirely not necessary to go look up a bunch of advanced material to solve them.

tl;dr; We will not be answering every question of "Can I use X" or "Will I lose points if I use Y" because the general answer is "You are not forbidden from using anything as long as it meets the spec requirements. If you're unsure if it violates a spec restriction, don't use it and just stick to what we learned before the assignment was released."

This assignment is due by Thursday, July 23 at 23:59 (PDT) .

You should submit your finished hw3.py , and hw3-written.txt on Ed .

You may submit your assignment as many times as you want before the late cutoff (remember submitting after the due date will cost late days). Recall on Ed, you submit by pressing the "Mark" button. You are welcome to develop the assignment on Ed or develop locally and then upload to Ed before marking.

 alt=

Data Visualization | 1st Edition

Available study tools, mindtap for camm/cochran/fry/ohlmann's data visualization: exploring and explaining with data, 1 term instant access, about this product.

DATA VISUALIZATION: Exploring and Explaining with Data is designed to introduce best practices in data visualization to undergraduate and graduate students. This is one of the first books on data visualization designed for college courses. The book contains material on effective design, choice of chart type, effective use of color, how to explore data visually, and how to explain concepts and results visually in a compelling way with data. Indeed, the skills developed in this book will be helpful to all who want to influence with data or be accurately informed by data. The book is designed for a semester-long course at either undergraduate or graduate level. The examples used in this book are drawn from a variety of functional areas in the business world including accounting, finance, operations, and human resources as well as from sports, politics, science, and economics.

What Is Data Visualization: Brief Theory, Useful Tips and Awesome Examples

  • Share on Facebook
  • Share on Twitter

By Al Boicheva

in Insights , Inspiration

3 years ago

Viewed 9,709 times

Spread the word about this article:

What Is Data Visualization Brief Theory, Useful Tips and Awesome Examples

Updated: June 23, 2022

To create data visualization in order to present your data is no longer just a nice to have skill. Now, the skill to effectively sort and communicate your data through charts is a must-have for any business in any field that deals with data. Data visualization helps businesses quickly make sense of complex data and start making decisions based on that data. This is why today we’ll talk about what is data visualization. We’ll discuss how and why does it work, what type of charts to choose in what cases, how to create effective charts, and, of course, end with beautiful examples.

So let’s jump right in. As usual, don’t hesitate to fast-travel to a particular section of your interest.

Article overview: 1. What Does Data Visualization Mean? 2. How Does it Work? 3. When to Use it? 4. Why Use it? 5. Types of Data Visualization 6. Data Visualization VS Infographics: 5 Main Differences 7. How to Create Effective Data Visualization?: 5 Useful Tips 8. Examples of Data Visualization

1. What is Data Visualization?

Data Visualization is a graphic representation of data that aims to communicate numerous heavy data in an efficient way that is easier to grasp and understand . In a way, data visualization is the mapping between the original data and graphic elements that determine how the attributes of these elements vary. The visualization is usually made by the use of charts, lines, or points, bars, and maps.

  • Data Viz is a branch of Descriptive statistics but it requires both design, computer, and statistical skills.
  • Aesthetics and functionality go hand in hand to communicate complex statistics in an intuitive way.
  • Data Viz tools and technologies are essential for making data-driven decisions.
  • It’s a fine balance between form and functionality.
  • Every STEM field benefits from understanding data.

2. How Does it Work?

If we can see it, our brains can internalize and reflect on it. This is why it’s much easier and more effective to make sense of a chart and see trends than to read a massive document that would take a lot of time and focus to rationalize. We wouldn’t want to repeat the cliche that humans are visual creatures, but it’s a fact that visualization is much more effective and comprehensive.

In a way, we can say that data Viz is a form of storytelling with the purpose to help us make decisions based on data. Such data might include:

  • Tracking sales
  • Identifying trends
  • Identifying changes
  • Monitoring goals
  • Monitoring results
  • Combining data

3. When to Use it?

Data visualization is useful for companies that deal with lots of data on a daily basis. It’s essential to have your data and trends instantly visible. Better than scrolling through colossal spreadsheets. When the trends stand out instantly this also helps your clients or viewers to understand them instead of getting lost in the clutter of numbers.

With that being said, Data Viz is suitable for:

  • Annual reports
  • Presentations
  • Social media micronarratives
  • Informational brochures
  • Trend-trafficking
  • Candlestick chart for financial analysis
  • Determining routes

Common cases when data visualization sees use are in sales, marketing, healthcare, science, finances, politics, and logistics.

4. Why Use it?

Short answer: decision making. Data Visualization comes with the undeniable benefits of quickly recognizing patterns and interpret data. More specifically, it is an invaluable tool to determine the following cases.

  • Identifying correlations between the relationship of variables.
  • Getting market insights about audience behavior.
  • Determining value vs risk metrics.
  • Monitoring trends over time.
  • Examining rates and potential through frequency.
  • Ability to react to changes.

5. Types of Data Visualization

As you probably already guessed, Data Viz is much more than simple pie charts and graphs styled in a visually appealing way. The methods that this branch uses to visualize statistics include a series of effective types.

Map visualization is a great method to analyze and display geographically related information and present it accurately via maps. This intuitive way aims to distribute data by region. Since maps can be 2D or 3D, static or dynamic, there are numerous combinations one can use in order to create a Data Viz map.

COVID-19 Spending Data Visualization POGO by George Railean

The most common ones, however, are:

  • Regional Maps: Classic maps that display countries, cities, or districts. They often represent data in different colors for different characteristics in each region.
  • Line Maps: They usually contain space and time and are ideal for routing, especially for driving or taxi routes in the area due to their analysis of specific scenes.
  • Point Maps: These maps distribute data of geographic information. They are ideal for businesses to pinpoint the exact locations of their buildings in a region.
  • Heat Maps: They indicate the weight of a geographical area based on a specific property. For example, a heat map may distribute the saturation of infected people by area.

Charts present data in the form of graphs, diagrams, and tables. They are often confused with graphs since graphs are indeed a subcategory of charts. However, there is a small difference: graphs show the mathematical relationship between groups of data and is only one of the chart methods to represent data.

Gluten in America - chart data visualization

Infographic Data Visualization by Madeline VanRemmen

With that out of the way, let’s talk about the most basic types of charts in data visualization.

Finance Statistics - Bar Graph visualization

They use a series of bars that illustrate data development.  They are ideal for lighter data and follow trends of no more than three variables or else, the bars become cluttered and hard to comprehend. Ideal for year-on-year comparisons and monthly breakdowns.

Pie chart visualization type

These familiar circular graphs divide data into portions. The bigger the slice, the bigger the portion. They are ideal for depicting sections of a whole and their sum must always be 100%. Avoid pie charts when you need to show data development over time or lack a value for any of the portions. Doughnut charts have the same use as pie charts.

Line graph - common visualization type

They use a line or more than one lines that show development over time. It allows tracking multiple variables at the same time. A great example is tracking product sales by a brand over the years. Area charts have the same use as line charts.

Scatter Plot

Scatter Plot - data visualization idea

These charts allow you to see patterns through data visualization. They have an x-axis and a y-axis for two different values. For example, if your x-axis contains information about car prices while the y-axis is about salaries, the positive or negative relationship will tell you about what a person’s car tells about their salary.

Unlike the charts we just discussed, tables show data in almost a raw format. They are ideal when your data is hard to present visually and aim to show specific numerical data that one is supposed to read rather than visualize.

Creative data table visualization

Data Visualisation | To bee or not to bee by Aishwarya Anand Singh

For example, charts are perfect to display data about a particular illness over a time period in a particular area, but a table comes to better use when you also need to understand specifics such as causes, outcomes, relapses, a period of treatment, and so on.

6. Data Visualization VS Infographics

5 main differences.

They are not that different as both visually represent data. It is often you search for infographics and find images titled Data Visualization and the other way around. In many cases, however, these titles aren’t misleading. Why is that?

  • Data visualization is made of just one element. It could be a map, a chart, or a table. Infographics , on the other hand, often include multiple Data Viz elements.
  • Unlike data visualizations that can be simple or extremely complex and heavy, infographics are simple and target wider audiences. The latter is usually comprehensible even to people outside of the field of research the infographic represents.
  • Interestingly enough, data Viz doesn’t offer narratives and conclusions, it’s a tool and basis for reaching those. While infographics, in most cases offer a story and a narrative. For example, a data visualization map may have the title “Air pollution saturation by region”, while an infographic with the same data would go “Areas A and B are the most polluted in Country C”.
  • Data visualizations can be made in Excel or use other tools that automatically generate the design unless they are set for presentation or publishing. The aesthetics of infographics , however, are of great importance and the designs must be appealing to wider audiences.
  • In terms of interaction, data visualizations often offer interactive charts, especially in an online form. Infographics, on the other hand, rarely have interaction and are usually static images.

While on topic, you could also be interested to check out these 50 engaging infographic examples that make complex data look great.

7. Tips to Create Effective Data Visualization

The process is naturally similar to creating Infographics and it revolves around understanding your data and audience. To be more precise, these are the main steps and best practices when it comes to preparing an effective visualization of data for your viewers to instantly understand.

1. Do Your Homework

Preparation is half the work already done. Before you even start visualizing data, you have to be sure you understand that data to the last detail.

Knowing your audience is undeniable another important part of the homework, as different audiences process information differently. Who are the people you’re visualizing data for? How do they process visual data? Is it enough to hand them a single pie chart or you’ll need a more in-depth visual report?

The third part of preparing is to determine exactly what you want to communicate to the audience. What kind of information you’re visualizing and does it reflect your goal?

And last, think about how much data you’ll be working with and take it into account.

2. Choose the Right Type of Chart

In a previous section, we listed the basic chart types that find use in data visualization. To determine best which one suits your work, there are a few things to consider.

  • How many variables will you have in a chart?
  • How many items will you place for each of your variables?
  • What will be the relation between the values (time period, comparison, distributions, etc.)

With that being said, a pie chart would be ideal if you need to present what portions of a whole takes each item. For example, you can use it to showcase what percent of the market share takes a particular product. Pie charts, however, are unsuitable for distributions, comparisons, and following trends through time periods. Bar graphs, scatter plots,s and line graphs are much more effective in those cases.

Another example is how to use time in your charts. It’s way more accurate to use a horizontal axis because time should run left to right. It’s way more visually intuitive.

3. Sort your Data

Start with removing every piece of data that does not add value and is basically excess for the chart. Sometimes, you have to work with a huge amount of data which will inevitably make your chart pretty complex and hard to read. Don’t hesitate to split your information into two or more charts. If that won’t work for you, you could use highlights or change the entire type of chart with something that would fit better.

Tip: When you use bar charts and columns for comparison, sort the information in an ascending or a descending way by value instead of alphabetical order.

4. Use Colors to Your Advantage

In every form of visualization, colors are your best friend and the most powerful tool. They create contrasts, accents, and emphasis and lead the eye intuitively. Even here, color theory is important.

When you design your chart, make sure you don’t use more than 5 or 6 colors. Anything more than that will make your graph overwhelming and hard to read for your viewers. However, color intensity is a different thing that you can use to your advantage. For example, when you compare the same concept in different periods of time, you could sort your data from the lightest shade of your chosen color to its darker one. It creates a strong visual progression, proper to your timeline.

Things to consider when you choose colors:

  • Different colors for different categories.
  • A consistent color palette for all charts in a series that you will later compare.
  • It’s appropriate to use color blind-friendly palettes.

5. Get Inspired

Always put your inspiration to work when you want to be at the top of your game. Look through examples, infographics, and other people’s work and see what works best for each type of data you need to implement.

This Twitter account Data Visualization Society is a great way to start. In the meantime, we’ll also handpick some amazing examples that will get you in the mood to start creating the visuals for your data.

8. Examples for Data Visualization

As another art form, Data Viz is a fertile ground for some amazing well-designed graphs that prove that data is beautiful. Now let’s check out some.

Dark Souls III Experience Data

We start with Meng Hsiao Wei’s personal project presenting his experience with playing Dark Souls 3. It’s a perfect example that infographics and data visualization are tools for personal designs as well. The research is pretty massive yet very professionally sorted into different types of charts for the different concepts. All data visualizations are made with the same color palette and look great in infographics.

Data of My Dark Souls 3 example

My dark souls 3 playing data by Meng Hsiao Wei

Greatest Movies of all Time

Katie Silver has compiled a list of the 100 greatest movies of all time based on critics and crowd reviews. The visualization shows key data points for every movie such as year of release, oscar nominations and wins, budget, gross, IMDB score, genre, filming location, setting of the film, and production studio. All movies are ordered by the release date.

Greatest Movies visualization chart

100 Greatest Movies Data Visualization by Katie Silver

The Most Violent Cities

Federica Fragapane shows data for the 50 most violent cities in the world in 2017. The items are arranged on a vertical axis based on population and ordered along the horizontal axis according to the homicide rate.

The Most Violent Cities example

The Most Violent Cities by Federica Fragapane

Family Businesses as Data

These data visualizations and illustrations were made by Valerio Pellegrini for Perspectives Magazine. They show a pie chart with sector breakdown as well as a scatter plot for contribution for employment.

Family Businesses as Data Visual

PERSPECTIVES MAGAZINE – Family Businesses by Valerio Pellegrini

Orbit Map of the Solar System

The map shows data on the orbits of more than 18000 asteroids in the solar system. Each asteroid is shown at its position on New Years’ Eve 1999, colored by type of asteroid.

Orbit Map of the Solar System graphic

An Orbit Map of the Solar System by Eleanor Lutz

The Semantics Of Headlines

Katja Flükiger has a take on how headlines tell the story. The data visualization aims to communicate how much is the selling influencing the telling. The project was completed at Maryland Institute College of Art to visualize references to immigration and color-coding the value judgments implied by word choice and context.

The Semantics Of Headlines graph

The Semantics of Headlines by Katja Flükiger

Moon and Earthquakes

This data visualization works on answering whether the moon is responsible for earthquakes. The chart features the time and intensity of earthquakes in response to the phase and orbit location of the moon.

Moon and Earthquakes statistics visual

Moon and Earthquakes by Aishwarya Anand Singh

Dawn of the Nanosats

The visualization shows the satellites launched from 2003 to 2015. The graph represents the type of institutions focused on projects as well as the nations that financed them. On the left, it is shown the number of launches per year and satellite applications.

Dawn of the Nanosats visualization

WIRED UK – Dawn of the by Nanosats by Valerio Pellegrini

Final Words

Data visualization is not only a form of science but also a form of art. Its purpose is to help businesses in any field quickly make sense of complex data and start making decisions based on that data. To make your graphs efficient and easy to read, it’s all about knowing your data and audience. This way you’ll be able to choose the right type of chart and use visual techniques to your advantage.

You may also be interested in some of these related articles:

  • Infographics for Marketing: How to Grab and Hold the Attention
  • 12 Animated Infographics That Will Engage Your Mind from Start to Finish
  • 50 Engaging Infographic Examples That Make Complex Ideas Look Great
  • Good Color Combinations That Go Beyond Trends: Inspirational Examples and Ideas

data visualization homework

Add some character to your visuals

Cartoon Characters, Design Bundles, Illustrations, Backgrounds and more...

Like us on Facebook

Subscribe to our newsletter

Be the first to know what’s new in the world of graphic design and illustrations.

  • [email protected]

Browse High Quality Vector Graphics

E.g.: businessman, lion, girl…

Related Articles

How to build strong brand & visual identity, free clipart for teachers: top 12 sources to find what you need, where to find free vector images for commercial use, the 9 most popular children book illustration styles, 10 wordpress alternatives to consider in 2021, check out our infographics bundle with 500+ infographic templates:, enjoyed this article.

Don’t forget to share!

  • Comments (2)

data visualization homework

Al Boicheva

Al is an illustrator at GraphicMama with out-of-the-box thinking and a passion for anything creative. In her free time, you will see her drooling over tattoo art, Manga, and horror movies.

data visualization homework

Thousands of vector graphics for your projects.

Hey! You made it all the way to the bottom!

Here are some other articles we think you may like:

Creative Modern Menu Designs

Inspiration

Creative modern menu designs that boost the appetite.

by Lyudmil Enchev

data visualization homework

15 Website Design Ideas to Try Now + Free Design Assets

21 Most Famous Brand Mascot Designs of All Time

21 Most Famous Brand Mascot Designs of All Time

by Iveta Pavlova

Looking for Design Bundles or Cartoon Characters?

A source of high-quality vector graphics offering a huge variety of premade character designs, graphic design bundles, Adobe Character Animator puppets, and more.

data visualization homework

tableau.com is not available in your region.

Digital Writing and Research Lab

Teaching Data Visualization: An Introduction

A word cloud made with the 200 most common words in this post

[cs_content][cs_section parallax=”false” style=”margin: 0px;padding: 45px 0px;”][cs_row inner_container=”true” marginless_columns=”false” style=”margin: 0px auto;padding: 0px;”][cs_column fade=”false” fade_animation=”in” fade_animation_offset=”45px” fade_duration=”750″ type=”1/1″ style=”padding: 0px;”][cs_text]Team Data Visualization is proud to present a set of lesson plans that are ready to use in your classroom (networked or not). Whether you’ve been thinking about introducing a data visualization lesson of some kind, or have no idea what that would even look like or how it would fit in a writing classroom (or any classroom for that matter), we’ve got you covered.

A mystery scatter plot demonstrating how data doesn't make sense without context; a large number of colored dots arranged in vertical lines, with no axes, key or labels

Teaching in a DWRL classroom? Need a hand or want to schedule a data visualization workshop for your students? Visit our mentoring office in PAR 8B or email a staff member to learn more about our consulting and support services.

Featured image: A word cloud made with the 200 most common words in this post. [/cs_text][/cs_column][/cs_row][/cs_section][cs_section parallax=”false” style=”margin: 0px;padding: 45px 0px;”][cs_row inner_container=”true” marginless_columns=”false” style=”margin: 0px auto;padding: 0px;”][cs_column fade=”false” fade_animation=”in” fade_animation_offset=”45px” fade_duration=”750″ type=”1/3″ style=”padding: 0px;”][x_button size=”global” block=”false” circle=”false” icon_only=”false” href=”https://www.dwrl.utexas.edu/2016/10/31/database-rhetorics/” title=”” target=”” info=”none” info_place=”top” info_trigger=”hover” info_content=””]Database Rhetorics[/x_button][/cs_column][cs_column fade=”false” fade_animation=”in” fade_animation_offset=”45px” fade_duration=”750″ type=”1/3″ style=”padding: 0px;”][x_button size=”global” block=”false” circle=”false” icon_only=”false” href=”https://www.dwrl.utexas.edu/2016/10/31/lesson-plan-navigating-research-with-mindmup/” title=”” target=”” info=”none” info_place=”top” info_trigger=”hover” info_content=””]Navigating Research[/x_button][/cs_column][cs_column fade=”false” fade_animation=”in” fade_animation_offset=”45px” fade_duration=”750″ type=”1/3″ style=”padding: 0px;”][x_button size=”global” block=”false” circle=”false” icon_only=”false” href=”https://www.dwrl.utexas.edu/2016/10/31/infographic-recomposition/” title=”” target=”” info=”none” info_place=”top” info_trigger=”hover” info_content=””]Visual Literacy[/x_button][/cs_column][/cs_row][/cs_section][/cs_content]

dwrl

Related Posts

Proposal assignment.

data visualization homework

Student Essay AI Co-Writing Public Demonstration

data visualization homework

Flash Fellowship: The Re-lineator

data visualization homework

Data Visualization: On and Off the Screen

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

HW 1 - Data visualization

This homework is due Thursday, Sep 15 at 11:59pm.

Getting Started

Go to the sta199-f22-2 organization on GitHub. Click on the repo with the prefix hw-01 . It contains the starter documents you need to complete the homework assignment.

Clone the repo and start a new project in RStudio. See the Lab 0 instructions for details on cloning a repo and starting a new R project.

Guidelines + tips

As we’ve discussed in lecture, your plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

Remember that continuing to develop a sound workflow for reproducible data analysis is important as you complete this homework and other assignments in this course. There will be periodic reminders in this assignment to remind you to knit, commit, and push your changes to GithHub. You should have at least 3 commits with meaningful commit messages by the end of the assignment.

Note: Do not let R output answer the question for you unless the question specifically asks for just a plot. For example, if the question asks for the number of columns in the data set, please type out the number of columns. You are subject to lose points if you do not.

Workflow + formatting

Make sure to

  • Update author name on your document.
  • Label all code chunks informatively and concisely.
  • Follow the Tidyverse code style guidelines.
  • Make at least 3 commits.
  • Resize figures where needed, avoid tiny or huge plots.
  • Turn in an organized, well formatted document.

Data 1: Duke Forest houses

Use this dataset for Exercises 1 and 2.

For the following two exercises you will work with data on houses that were sold in the Duke Forest neighborhood of Durham, NC in November 2020. The duke_forest dataset comes from the openintro package. You can see a list of the variables on the package website or by running ?duke_forest in your console.

Suppose you’re helping some family friends who are looking to buy a house in Duke Forest. As they browse Zillow listings, they realize some houses have garages and others don’t, and they wonder: Does having a garage make a difference?

Luckily, you can help them answer this question with data visualization!

  • In order to do this, you will first need to create a new variable called garage (with levels "Garage" and "No garage" ).
  • Below is the code for creating this new variable. Here, we mutate() the duke_forest data frame to add a new variable called garage which takes the value "Garage" if the text string "Garage" is detected in the parking variable and takes the test string "No garage" if not.
  • Then, facet by garage and use different colors for the two facets.
  • Choose an appropriate binwidth and decide whether a legend is needed, and turn it off if not.
  • Include informative title and axis labels.
  • Finally, include a brief (2-3 sentence) narrative comparing the distributions of prices of Duke Forest houses that do and don’t have garages. Your narrative should touch on whether having a garage “makes a difference” in terms of the price of the house.

Now is a good time to render, commit, and push. Make sure that you commit and push all changed documents and your Git pane is completely empty before proceding.

It’s expected that within any given marker larger houses will be priced higher. It’s also expected that the age of the house will have an effect on the price. However in some markets new houses might be more expensive while in others new construction might mean “no character” and hence be less expensive. So your family friends ask: “In Duke Forest, do houses that are bigger and more expensive tend to be newer ones than those that are smaller and cheaper?”

Once again, data visualization skills to the rescue!

  • Create a scatter plot to exploring the relationship between price and area , conditioning for year_built .
  • Use geom_smooth() with the argument se = FALSE to add a smooth curve fit to the data and color the points by year_built .
  • Include informative title, axis, and legend labels.
  • Claim 1: Larger houses are priced higher.
  • Claim 2: Newer houses are priced higher.
  • Claim 3: Bigger and more expensive houses tend to be newer ones than smaller and cheaper ones.

Now is a good time to render, commit, and push.

Make sure that you commit and push all changed documents and your Git pane is completely empty before proceding.

Data 2: BRFSS

Use this dataset for Exercises 3 to 5.

The Behavioral Risk Factor Surveillance System (BRFSS) is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. Source: cdc.gov/brfss

In the following exercises we will work with data from the 2020 BRFSS survey. The originally come from here , though we will work with a random sample of responses and a small number of variables from the data provided. These have already been sampled for you and the dataset you’ll use can be found in the data folder of your repo. It’s called brfss.csv .

  • How many rows are in the brfss dataset? What does each row represent?
  • How many columns are in the brfss dataset? Indicate the type of each variable.
  • Include the code and resulting output used to support your answer.

Do people who smoke more tend to have worse health conditions?

  • Use a segmented bar chart to visualize the relationship between smoking ( smoke_freq ) and general health ( general_health ). Decide on which variable to represent with bars and which variable to fill the color of the bars by.
  • Below is sample code for releveling general_health . Here we first convert general_health to a factor (how R stores categorical data) and then order the levels from Excellent to Poor.
  • Comment on the motivating question based on evidence from the visualization: Do people who smoke more tend to have worse health conditions?

How are sleep and general health associated?

  • Create a visualization displaying the relationship between sleep and general_health .
  • Modify your plot to use a different theme than the default.
  • Comment on the motivating question based on evidence from the visualization: How are sleep and general health associated?
  • The gg in the name of the package ggplot2 stands for ___.
  • If you map the same continuous variable to both x and y aesthetics in a scatterplot, you get a straight ___ line. (Choose between “vertical”, “horizontal”, or “diagonal”.)
  • Code style: Fix up the code style by spaces and line breaks where needed. Briefly describe your fixes. (Hint: You can refer to the Tidyverse style guide .)
  • Read ?facet_wrap . What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol arguments?

Render, commit, and push one last time.

  • Go to http://www.gradescope.com and click Log in in the top right corner.
  • Click School Credentials Duke Net ID and log in using your Net ID credentials.
  • Click on your STA 199 course.
  • Click on the assignment, and you’ll be prompted to submit it.
  • Mark all the pages associated with exercise. All the pages of your homework should be associated with at least one question (i.e., should be “checked”). If you do not do this, you will be subject to lose points on the assignment.
  • Select the first page of your PDF submission to be associated with the “Workflow & formatting” question.
  • Exercise 1: 7 points
  • Exercise 2: 9 points
  • Exercise 3: 5 points
  • Exercise 4: 9 points
  • Exercise 5: 7 points
  • Exercise 6: 8 points
  • Workflow + formatting: 5 points
  • Total: 50 points

data visualization homework

In this section a homework is provided to reinforce the learned materials of this section about Tableau Desktop software. In fact, in order to re-examine our learned materials, for completing this homework, you need to download this dataset from of COVID-19 at the country level of the whole world: https://covid.ourworldindata.org/data/owid-covid-data.xlsx from Our World in Data repository.

Homework's dataset

COVID-19 at the country level of the whole world: https://covid.ourworldindata.org/data/owid-covid-data.xlsx from Our World in Data repository

Upload the new dataset given in this homework from Our World in Data repository into the Tableau Desktop. Now follow this procedure and save the results in each section:

Create groups based on location and continent fields.

Visualize the weekly number of total cases and new cases .

Use Multiple measures in a single view that is created in the section 2, and add total deaths and new deaths to the visualization.

Use calculated fields and define a fatality rate field (fatality rate: number of deaths / number of cases) and visualize it grouped by continent .

Create a simple dashboard and put the visualizations of total cases , new cases , and fatality rate rate. Make sure the labels of the visualizations are informative and put the link to the source of the dataset used for this dashboard as a hyperlink in the dashboard.

Previous page

Lecture 1: Data Visualization

Today’s visualization.

data visualization homework

Lighthouses

Full animated world map at https://geodienst.github.io/lighthousemap/

  • Color true to real lighthouse
  • Timing of blinks true to real lighthouse
  • Size of dot corresponds to visibility range
  • Data drawn from OpenStreetMap – completion and correction through crowdsourcing

Related, and also excellent: https://twitter.com/i/status/1462095711508516865

Syllabus and Structure of the Course

After this course, you will…

  • Be able to describe the key design guidelines and techniques used for the visual display of information.
  • Understand how to best use the capabilities of visual perception in a graphic display.
  • Understand the principles of interactive visualizations.
  • Understand how Machine Learning techniques can determine data structure and pattern.
  • Explore and critically evaluate a wide range of visualization techniques and applications.

We will primarily work with:

  • Tamara Munzner: Visualization Analysis & Design (ISBN 978-1466-50891-0)
  • Leland Wilkinson: The Grammar of Graphics, 2nd ed (ISBN 978-0387-24544-7) Wilkinson can be acquired free through the library’s ebook access to SpringerLink. That way you can also get a cheap print-on-demand copy.

We meet in-person (or over Zoom if we are mandated back online) Wednesdays 14.00 - 16.00.

In addition, all course information can be found on Blackboard.

I can be reached on [email protected], and will happily schedule meetings if you need them.

Between meetings, you will read assigned chapters from the textbook, and work on smaller assignments and semester projects.

Semester-long assignments

  • Data Visualization Project – you take some dataset and develop a data visualization of that dataset with communicative intent.
  • Blog Post – you take a recent paper from the IEEE Vis conference, and summarize it for a popular audience.

Weekly assignments

  • Read assigned textbook chapters
  • Find a recent published data visualization that embodies some concept in the assigned readings. Prepare to present that graphic to the class, and to demonstrate to the class what it does particularly well, and what could be improved.

Occasional assignments

  • Recreate a published visualization using some data visualization tool.

Semester Schedule (may be changed as we go)

What this course will not do

  • …teach one specific data visualization platform. However, you should take this time to learn at least one platform well. I’m happy to help you in the process, but you pick a platform and work through tutorials to get going yourself.
  • …get you ready to submit a paper to a major data vis conference yourself. However, by the time you finish the course, you should know a direction to go if you have this ambition.
  • …deliver content to a passive student audience. Your participation is essential. Prepare for each lecture, collect questions and thoughts, and discuss eagerly in class.

What is data visualization anyway?!?

Defining data visualization.

The visual representation and presentation of data to facilitate understanding.

Andy Kirk (Data Visualization)

Visual representation of datasets designed to help people carry out tasks more effectively.

Tamara Munzner

  • Task-oriented / understanding
  • Data-oriented
  • Human visual system as co-processor

Why humans?

We don’t need data vis when tasks can be fully automated.

We might not know what questions we have in advance.

Why external representation?

  • Replaces cognition with perception. We don’t need to know how the brain does pattern recognition in order to use it.

Why visual?

  • Vision is high-bandwidth interface with our brain. Overview is possible due to background processing – we experience seeing everything simultaneously, and the visual system processes in parallel and pre-attentively.
  • Sound: lower bandwidth, different semantics, no overview, subjective experience of sequentiality.
  • Touch / haptics: low bandwidth, low record/replay capacity
  • Taste, smell: no viable record/replay devices

Why all the data?

Summaries inherently lose information

  • 4 datasets, identical statistics

Each value exact up to at least 2 decimal places.

12 datasets, identical statistics

data visualization homework

Design: balancing constraints

Design in a context, all design work has a context, which provides limitations.

  • computation time, system memory
  • information density
  • “ink ratio”
  • human time, memory, attention, capacities of vision, understanding

Analytic Scaffolds

Three different sets of questions and considerations to guide your design work.

3 design questions:

3 phases of understanding:

  • interpreting
  • comprehending

4 stage design process:

  • formulating your brief
  • working with data
  • establishing your editorial thinking
  • developing the design solution

3 design principles:

  • trustworthy

Edward Tufte

  • 6 design principles
  • coined terminology: ink ratios, chart junk

Tufte: Fundamental Principles of Analytical Design

  • Show comparisons, contrasts, differences.
  • Show causality, mechanism, explanation, systematic structure.
  • Show multivariate data; that is, show more than 1 or 2 variables.
  • Completely integrate words, numbers, images, diagrams.
  • Thoroughly describe the evidence. Provide a detailed title, indicate the authors and sponsors, document the data sources, show complete measurement scales, point out relevant issues.
  • Analytical presentations ultimately stand or fall depending on the quality, relevance, and integrity of their content.

Edward Tufte, Beautiful Evidence, pp 122 - 139

Tufte: Data ink and chart junk

Early Tufte design guidance:

Measure and maximize the data-ink ratio . Data-ink is the non-erasable core of a graphic, and the ratio to maximize is (data-ink / total ink)

Eschew and eliminate chart junk – graphical decorations, textures, patterns, all of which just increase total ink without increasing data ink.

Kirk: Phases of Understanding

Visualizer control

Viewer control

What do I see?

  • What data is shown?
  • How is it represented?
  • What features are observable?

Interpreting

What does it mean, given the subject?

What features are…

  • interesting?
  • unexpected?

Comprehending

What does it mean, to me?

  • What have I learnt?
  • What do I feel?
  • What do I do now?

Kirk: Design Process

Four fundamental steps.

  • Formulating your brief planning, defining and initiating your project
  • Working with data gathering, handling and preparing your data
  • Establishing your editorial thinking defining what you will show your audience
  • Developing your design solution making design choices about how you represent and present what it is you want to show your audience

Kirk: Design principles

Based on Dieter Rams’ 10 principles of good design:

  • Good design is innovative
  • Good design makes a product useful
  • Good design is aesthetic
  • Good design makes a product understandable
  • Good design is unobtrusive
  • Good design is honest
  • Good design is long-lasting
  • Good design is thorough down to the last detail
  • Good design is environmentally friendly
  • Good design is as little design as possible

Principle 1. Good visualization design is trustworthy .

Principle 2. Good visualization design is accessible .

Principle 3. Good visualization design is elegant .

Munzner: Design questions and levels

Each level (domain/abstraction/idiom/algorithm) contained in the previous.

  • who are the target users?
  • translate from specifics of domain to vocabulary of visualization
  • don’t just draw what you’re given: transform to new form
  • why is the user looking at it? task abstraction
  • visual encoding idiom : how to draw
  • interaction idiom : how to manipulate
  • efficient computation

Different levels have different failure modes

  • Algorithm Your code is too slow

Solution: use methods from different fields at each level.

  • Algorithm Measure system time/memory. Analyze computational complexity. (computer science)
  • Analyze results qualitatively. Measure human time with lab experiment (lab study). (cognitive psychology)
  • Observe target users after deployment (field study). (anthropology/ethnography)
  • Measure adoption.

Design questions impose a structure on an otherwise vast design space.

Dataset Types

Tables (tidy data)

  • Attributes (columns)
  • Items (rows)
  • Items (nodes)
  • Value cells distributed on a shape

Data cubes / tensors

  • Value cells distributed in a hypercube
  • Relationships

Geometry (spatial)

Attribute Type

  • Categorical
  • Quantitative (interval)

Ordering Direction

Data Availability

  • Target known/unknown
  • Location known/unknown
  • Distribution
  • Correlation

Network Data

Spatial Data

Map from categorical and ordered attributes

  • Size , Angle , Curvature

Superimpose

No One Good Answer

Data visualization is an aesthetic field.

You will see disagreements on what is and is not a good design.

And on what is and is not a good design principle.

Data Visualization is a communication field

Many applications of data visualization communicate a message, either intentionally or unintentionally.

Notice how Kirk emphasize the communication, Munzner acknowledges it, and Tufte all but ignores that aspect.

Is this a good graphic?

Tufte: Look at all that chart junk! So much decorations that do not directly encode data!

Kirk: cites Jen Christiansen, Graphics Editor at Scientific American. “I found that when I developed magazine graphics according to [Tufte’s] philosophy, they were most often met with a yawn. The reality is that Scientific American isn’t required reading. We need to engage readers, as well as inform them.”

Decorations provide context for the information – it is immediately apparent what the data is about (something something razors) without impacting the trustworthiness of the data display itself.

data visualization homework

Very popular target as an example of a bad graph. The inverted y-axis is very often invoked as a condemning feature.

data visualization homework

Kirk points out that it was designed to emulate another chart published earlier: “Iraq’s bloody toll”.

The red coloring and the inverted y-axis in combination are attempting to evoke a metaphor of blood dribbling down a wall.

data visualization homework

Question: Was the intended metaphor successful? In “Gun deaths in Florida”? In “Iraq’s bloody toll”? What could have been done differently to make the message more efficiently conveyed?

Data Visualization Toolkits

Pick a platform - and stick with it.

In this course, you will pick one platform and do all your exercises in this platform. Good options include:

  • matplotlib (and seaborn ) Python, not Grammar of Graphics
  • ggplot2 R, Grammar of Graphics
  • plotnine Python, Grammar of Graphics
  • altair Python, Grammar of Graphics
  • d3.js / ObservableJS JavaScript, not Grammar of Graphics

It’s better to build 80% proficiency in one tool than 20% each in 3 different tools. Your next job may well use something different - and for each tool you learn, the next one is easier to learn.

Out of the box - demo visualizations

We draw on the NYC OpenData portal and collect data on traffic on the NYC Ferry network.

The data we want is available at https://data.cityofnewyork.us/Transportation/NYC-Ferry-Ridership/t5n6-gx8c and we can compose a query (to offload some computation onto the NYC OpenData servers) to extract the dily rider count:

https://data.cityofnewyork.us/resource/t5n6-gx8c.csv? \(select=date,route,SUM(boardings)&\) group=date,route&$limit=1000000

We want a linegraph of the daily ridership, by ferry route

Python / matplotlib + seaborn

data visualization homework

R / ggplot2

data visualization homework

Python / plotnine

data visualization homework

Python / Altair

Javascript / observablejs+d3.js, your first aesthetic critique.

What differences and similarities do you see between the different “Out of the box” plots here?

What would you like to change?

What would you like to check / verify?

Would you like more (or less) binning and aggregation?

What, if any, interactive features would you like?

What, if any, labels, titles, annotations would you like to use?

What would an interesting use case for this plot be?

If you were to pull properties and features freely from all platforms (or add yourself) – how could you specify the most appropriate plot for this use case?

Homework: Reconstruct this plot in your chosen platform, and improve the things you discussed in your critique.

data visualization homework

Data Visualization and Data Technologies

Luke tierney, spring 2023.

  • Learn how to acquire, prepare, explore, and visualize data
  • in ways that are reproducible, reusable, and shareable
  • R and the RStudio IDE
  • Git and GitLab , using the UI GitLab service
  • RStudio Notebook for the class
  • FastX web client or desktop client for accessing the CLAS Linux computers.

Selected topics

  • Basic visualizations for different data types.
  • Defining visualizations using the Grammar of Graphics and ggplot2
  • Understanding human perception and its impact on visualization.
  • Exploring and cleaning data.
  • Getting data off the web.
  • Generating reports using knitr and R Markdown .
  • Version control with Git .
  • Automating an analytical pipeline, e.g. via Make

The course group on the UIowa GitLab server: https://research-git.uiowa.edu/STAT4580-Spring-2023

Getting Help

  • A Q&A discussion is available through ICON.
  • Your TA and I will try to answer short email questions.
  • We are also available in office hours.

Announcements and Hints

Data Files for your Project

For your project you will probably need to access one or more data files. You can make these available in several different ways. The basic options are:

Include the data file in your Project folder in your repository, commit it, and push to the UI GitLab service .

Have the file available on a web site and have your code download it or read it from a URL.

If you want to use the web option:

You can set up a personal web page, and place the files there. One option is to use the personal web pages provided on the CLAS Linux systems .

You can email your data files to me; I will make them available on the class web site and send you a URL for accessing them.

You can also include code for downloading the data from public web sites.

Resolving Notebook Server Issues

Occasionally you may get an error message when trying to connect to your notebook server. Two things to try to resolve this:

  • Clear your browser cache and cookies. .
  • Restart your server .

Utilities Package STAT4580

I made a small utilities package STAT4580 that provides:

A function checkHW for checking that your files are named properly and that your Rmarkdown file renders without error with a fresh checkout.

An Rmarkdown template for homework.

More may be added later. The package is installed on the RStudio notebook server and on FastX . You can install it on your own machine with

You may need to install the remotes package from CRAN first.

Some Course Details

Office hours will be held via Zoom MWF 10:30-11:20 or by appointment.

Grader office hours will be held via Zoom MTWTh 12:30 - 1:30.

Quizzes will be administered through ICON .

Homework/projects will be submitted via GitLab .

Please use the Q&A discussion, available in Discussions on the navigation bar at the left, to ask questions. Public questions are best when possible as then other students can benefit from seeing the answers. The TA and I will monitor the board and try to respond to any question within 24 hours. You can also email the TA or me, but we may redirect you to the Q&A discussion unless confidentiality is needed.

If you want help with a particular coding issue please commit and push what you have to GitLab, open an Issue in GitLab, and assign the issue to me and I will respond.

Please log into https://research-git.uiowa.edu/ with your HawkID so I can set up your class GitLab repository.

Please let me know if you have any issues accessing the course materials.

Homework and the Code of Academic Honesty

Homework submissions are governed by the Code of Academic Honesty .

  • Work you submit must be your own work, not the work of others.
  • If you use ideas or results found elsewhere you must cite them properly.

Acknowledgments

This course website is derived from the site for STAT545 at the University of British Columbia

Already have an account? Login

Test prep and homework help from private online Data Visualization tutors

Our online Data Visualization tutors offer personalized, one-on-one learning to help you improve your grades, build your confidence, and achieve your academic goals.

Top 50 online Data Visualization tutors

Hrach's photo

Armenian , English , Russian , Spanish

Yerevan , Armenia

USD $ 10 /hr

Mount Whitney High School , American University of Armenia

Professional Data Scientist and Statistician | Your go to tutor for Data Science (both with R and Python) | Machine Learning | Time Series Analysis | DBMS with MySQL and everything else

Hi! I am Hrach. I am a professional data scientist, statistician and senior majoring in Computer Science at the American University of Armenia. Throughout my years of study I have specialized in various fields related to Data Science and Statistics and completed different kinds of projects in respective fields. I have been tutoring for around half a year and have come here to expand my audience and share my knowledge and experience with many more people than before and show them what exciting and useful tools both fields offer!

Subjects : Data Analysis, Data Management, Data Science, Data Visualization, Database, Machine Learning, MySQL, Python, R Programming, SAS, SQL, Statistics

Adi's photo

9 years of tutoring

English , Hindi , Gujarati

Surrey , Canada

CAD $ 50 /hr

MICROSOFT EXCEL + OPERATIONS (Logistics & Supply Chain) + ACCOUNTING & FINANCE + STATISTICS TUTOR

They say if you can't explain it to someone in simple words, you don't understand it enough. As a business school graduate with 9+ years of tutoring experience at the university & corporate level, I can help you understand the subtle nuances & finer intricacies of what you are learning. Certified Expert in Microsoft Office programs & well-versed in a variety of quantitative courses. * My biggest strength would be keeping you engaged & really driving home the fundamental concepts & subtle nuances till they become second nature. * Also, I believe in establishing rapport so that you can be comfortable while learning in a pressure-free environment. * Most importantly, I will work at a pace at par with your capabilities so that you are not overwhelmed.

Subjects : Accounting, Business, Business Analysis, Data Analysis, Data Visualization, Finance, Logistics, Management, Marketing, Microsoft Excel, Microsoft Suite, Operations Management, Probability, Statistics, Supply Chain

Jetley's photo

4 years of tutoring

English , Hindi , French

Paris , France

USD $ 30 /hr

University of Oxford

Data Structures & Algos | Statistics | AI | ML | CV | Research Methodology

Doctorate from Oxford University and Post-doctorate from INRIA/Univeristy of Paris Saclay - I have 10+ years of research and development experience in areas of data mining, optimisation, machine learning, computer vision, deep learning, and AI. Feel free to reach out to me for: o Data structures and algorithms (<--current focus) o Search methods in AI (<- current focus) o Applied mathematics o Statistics o Optimisation o ML and Computer Vision *Theory and Assignment* related help or help with project design, research methodology, scientific/report writing. I focus on foundational understanding through discussions on mathematical formulation, logic, intuition, use cases, comparative analysis, etc. My pedagogy is to build up from first principles so the understanding is broad and transferable to related topics. Feel free to book in for trial sessions. I would love to get to know you and understand how I can aid your learning!

Subjects : Applied Mathematics, Artificial Intelligence, Computer Vision, Data Science, Data Structures & Algorithms, Data Visualization, IB Mathematics, Linear Algebra, Machine Learning, Optimization / Mathematical Programming, Python, SAT Mathematics, Statistics

Athira's photo

5 years of tutoring

English , Malayalam

Toronto , Canada

CAD $ 45 /hr

Arizona State University , Model engineering college

Data Science professional, Business Analytics Master’s graduate and an enthusiastic tutor to help you achieve your career/academic goals

I am a tech professional with more than 8 years of industry experience in data analytics and data science and more than 5 years of tutoring experience. My area of expertise is Data Analytics, Data Science and Mathematics. I enjoy sharing my knowledge and helping students at various skill levels to up skill and level up in their careers. I engage with students from all academic and professional backgrounds to help them achieve their academic/ career goals. Every student is different and their needs different as well. With the right approach, individually tailored curriculum, practice and dedication to learn, I believe everyone can master analytics. If this is what you are looking for, you can book a risk-free FREE consultation with me to know more.

Subjects : AP Statistics, Applied Mathematics, Data Analysis, Data Science, Data Visualization, Database, Math, MySQL, SQL, SQL Server

Aryan's photo

2 years of tutoring

English , Marathi

Mississauga , Canada

CAD $ 8 /hr

D.G.Ruparel College of Arts , Science and Commerce , Lambton College

An enthusiastic person who is keen on solving difficulties and circulating precise knowledge.

Microsoft-certified analyst in the field of data visualization, and analysis. Being a Business analyst student, a lot of work is done utilizing Microsoft Excel, unravelling statistical difficulties, and composing queries. So I would be keen on circulating my skillset with others by solving their cases and instructing them on the relevant explanations.

Subjects : Algebra, Data Analysis, Data Management, Data Visualization, Database, Discrete Math, Marathi, Math/Science, Maths, Microsoft Excel, Python, Statistics

Personalize your search. Find your perfect tutor today!

How it works

Private online tutoring in 3 easy steps

Find the best online tutor.

Discover a vast selection of online tutors who specialize in your course. Our online tutors cover all subjects and levels, so you can easily find the perfect match for your needs.

Book online sessions at any time

Schedule a session with your online tutor via desktop or mobile. Collaborate with your tutor and learn effectively in real-time.

Join our online classroom

Connect with your online tutor through our interactive online classroom. Share your course syllabus and create a customized plan for success.

Why TutorOcean

Expert help with the best online tutors

Our online tutors offer personalized, one-on-one learning to help you improve your grades, build your confidence, and achieve your academic goals.

Tutoring in an online classroom

Unified platform

Everything you need for successful online learning

Private tutors, interactive online classroom, pay as you go, online tutoring, explore thousands of online tutors. start learning now.

Success stories

Revolutionizing education with the power of online tutoring

“Akshay is an exceptional Pre-calculus tutor for university-level students. He has a great way of explaining complex concepts and ensures that his students understand them. He is always ready to provide additional explanations if needed. I highly recommend him and look forward to booking him again.” — Sasha

Best online tutor

“Richard is an exceptional tutor who has the ability to explain complex concepts in a simplistic way. His step-by-step instructions help to build confidence and understand the material better. Furthermore, he provides numerous tips and resources to facilitate success.” — Jessica

Best online tutor

“I had a session on Linear Algebra, and it was very helpful. Mirjana was excellent in explaining matrices, and I could understand the concepts quite well. I would definitely request her assistance again.” — Lateefah

Best online tutor

“Students struggling in math should seek help from Reza. He is patient, composed, and adept at explaining complex concepts in a clear and understandable way. He is also very generous with his time and willing to assist students on short notice.” — Rajasiva

Best online tutor

“Sierra provided me with an exceptional tutoring session in chemistry. She was patient and made sure that I fully comprehended every concept. I am grateful for her assistance.” — Erin

Best online tutor

“Michael did an excellent job in assisting me to comprehend various types of isomers. His tips and tricks were beneficial in resolving challenging problems.” — Jada

Best online tutor

“I have found Anisha to be an exceptionally patient tutor who provides clear explanations that have helped me to comprehend various topics. I would strongly recommend her to anyone who needs assistance.” — Sam

Best online tutor

“I received invaluable assistance from Patrick in terms of the direction for my papers. Collaborating with him was a comfortable experience, and it made the writing process much more manageable.” — Stephanie

Best online tutor

“Elena's assistance was invaluable to me during my college essay revision session on Greek Mythology for the Humanities subject. She provided positive and helpful feedback and demonstrated expertise in several areas, which she explained very nicely.” — Abigail

Best online tutor

Frequently asked questions

Nightingale Logo

Nightingale

The Journal of the Data Visualization Society

Data Visualization for Kids

data visualization homework

It all started when my nine-year-old son brought home his grade four math homework. (We live in Canada and he attends a French immersion program where all courses are taught in French, including math). The exercise asked students to visualize the kinds of fruits students preferred in a hypothetical class. 

data visualization homework

From the exercise, my son understood the concepts of axis labels and using rectangles to represent numerical data (excuse his bars, he didn’t have a ruler.) Because the data was already provided in a table, the exercise didn’t teach the beginning of a data visualization journey: asking a question and finding/collecting data to answer it. 

Over the last year, I decided to try teaching my son how data visualization can be used to answer practical questions.  

1. Knolling Animals

My first attempt occurred when we were at my in-law’s cottage. I noticed that my son found an old bag of plastic animal toys and arranged them into a pattern. I later learned that this kind of arrangement is called “ knolling ” and some kids do this naturally. A search online defines “knolling” as “a process of arranging related objects in parallel or 90-degree angles as a method of organization.” 

While my son’s knolled-animal creation was interesting to look at, I tried inspiring him to find a different arrangement that could also answer some questions about the animal toys like:

  • Which animal group has the most toys? 
  • Which has the least? 

Our first step was to clean our data and remove all single animal toys. After this, he placed the rest into individual groups. He could already tell that some animal groups had more toys than others. The image below shows his knolled animal toys on the left and the fun animal toy frequency bar graph we created on the right. 

data visualization homework

From this new arrangement, it was easy to answer questions about our animal dataset. It was also efficient to group various animal categories into bigger sets. For instance, when figuring out which animal group had the most toys, he recognized that there was no clear winner and that he had equal amounts of moose, zebras, hippos, giraffes, elephants, lions, and bears. Oh my! (Sorry, there were only three tigers 🙂 ). These animals all represented the group with the most toys. You get the idea. The same goes for the least toys in a category with camels, gorillas, and tigers. 

data visualization homework

This kind of exercise is fairly simple to set up and can be completed with any toys that can be grouped, for example, wooden building blocks or LEGO. It’s an effective way to get your child to consider presenting objects in a way that can answer questions. Of course, this also includes counting and simple arithmetic and what kid couldn’t use more practical math?

2. Smarties Science Project

My next dataviz teaching opportunity arrived when my son was assigned a science project. The idea was to think of a question he could answer by conducting some simple experiments. We decided that working with data that tasted good was the way to go, so we chose Smarties (they are kind of like M&Ms, but with more colours). 

Now that we chose to investigate coloured candy, the first step was to come up with a research question. As with the Knolling Animals project, two obvious, but opposite, questions emerged: what is the least and most common colour of candy? This was a great opportunity to introduce the concept of sample size! I asked my son to consider if one box of candy would be enough to answer our research question and this sparked a lively conversation about what data actually represents and its limitations. Since we didn’t want to answer our question in the context of only one box, which does not really represent reality, we purchased 10 (that gets a little closer to reality, still nowhere near the sample size needed to really make a conclusion, but sufficient to illustrate the point).

Our methods were simple and easy to replicate. Each box was labelled with a number and the contents were weighed (to practice using a scale and comparing the measured weight with the package label). Then the candies were arranged according to colour on a white foam board, counted, and recorded in a table online. This was a real chance for my son to find out what scientific work is really about.

data visualization homework

After we collected the data for all 10 boxes, it was time to analyze it. Once the total number of candies for each colour was calculated, it was still difficult to answer our research question and this made the case for creating a chart to visualize the data. This project provided an ideal opportunity to teach data collection and the thought process behind generating a table. Of course we could have made the chart using a software program, but I suggested building a candy bar chart would be fun to create (and eat). So we went for it! Although it was easier to tell which colours had the least and most candy, it wasn’t fully clear and we chatted about the importance of labelling the graph to make it easier. In our case, the bars helped guide viewer attention to the highest and lowest bars, and the labels provided the numerical information.

data visualization homework

According to our sample of 437 candies, we were able to answer our research question. Green is the least common and purple is the most common colour. When discussing these results I asked my son to compare these findings with the numbers for each individual box. This provided another opportunity to talk about sample size and to also consider if 437 candies was enough draw any definitive conclusions about how many candies of each colour the company makes (A.K.A. representative sampling in kid terms). I showed him an old commercial for this brand that used the slogan, “Do you eat the red ones last?” This afforded an excellent way to introduce the concept of probability and which box he thought would provide the greatest chance to eat a red candy last (if the candies were selected at random, of course). Finally, we chatted about some of the possible reasons a company would manufacture more or less of a specific colour. We considered the price of food dyes and equipment in our conversation.

data visualization homework

The candy experiment was a valuable exercise for my son who can differentiate between colours. This kind of project can be done with candies that have different shapes to avoid relying on colour (and if you want to eat your data at the end).

To reinforce his data collection/visualization skillset, we are currently working on a beach glass project. Last year, we collected beach glass from a local beach on different days. This data will be used to determine the frequency of various colours at our beach. It’s similar to the candy experiment, but has an interesting historical and chemistry component related to the glass industry. 

Exploring data visualization with my son through these activities is both educational and fun. We work with data that is familiar to him which makes it easy to brainstorm practical questions and common applications for answers. Bonus points if the data is edible.

Hopefully some of these activities will help you get started if you are interested in introducing data visualization to your child. Please share your creations with us on Twitter using the #kidsdataviz tag! 

This article would not have been written without encouragement and help from Mary Aviles!

data visualization homework

Julia Krolik

Julia Krolik is an information designer, data scientist, artist, entrepreneur, and public speaker. She founded Art the Science to help foster science-art culture in Canada. She also sits on the Data Visualization Society’s board as the Partnerships Director. Through her creative agency Pixels and Plans, Julia fuses scientific integrity with engaging design to empower government and NGO communication strategies.

Related articles

data visualization homework

The Ellis Puppy Olympics: A Kids’ Data Visualization Challenge

Screenshot of course landing page with image of Federica Fragapane holding a data visualization

REVIEW: “Data Visualization and Information Design: Create a Visual Model,” a Domestika course by Federica Fragapane

A screenshot of a video meeting between Neil Richards (a man with short dark hair, glasses, and a gray and black beard) and Jenn Schilling (a woman with short brown hair and glasses). Both Neil and Jenn are smiling and holding up their copies of the book Questions in Dataviz.

REVIEW: Questions in Dataviz by Neil Richards

Nightingale Magazine issue 4 cover

Begin typing your search above and press return to search. Press Esc to cancel.

Data visualization

  • Information Systems

Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

CSE6242 Data Visualization Homework

TianxueHu/CSE6242_Data_Viz

Folders and files, repository files navigation, cse 6242 data visualization homeworks.

  • Jupyter Notebook 34.3%
  • Python 5.8%

Data Visualization Homework 11/07

November 05, 2017.

This is a simple graphical represenation of relationship between variables of mtcars dataset as part of the Data Visualization Class Homework. Below is a short summary of mtcars data set and type of it’s variables.

Objective: Create a pie chart showing the proportion of cars from the mtcars data set that have different carb values.

Below is the R-code for data cleaning, prepation and visualization.

The pie chart describes the percentage breakdown of cars in the mtcars data set based on different number of carburetors. Each number besides a pie portion displays the number of carburetors and the percentage associated with it is the proportion of actual number of cars with that many carburetors in the entire mtcars data set.There is a 0.01% rounding error in the chart to keep the display to one decimal point.

Objective: Create a bar graph, that shows the number of each gear type in mtcars.

The above bar chart breaks down the number of gears into it’s different levels (3, 4 and 5) on the x-axis and shows the total number of cars that belong to each level of gears, which is measured through the y-axis.

Objective: Next show a stacked bar graph of the number of each gear type and how they are further divided out by cyl.

Movind a step further, this stacked barplot extends the bar chart described in the previous part. This chart displays the number of cars on the y-axis based on categorization of number of gears on the x-axis, similar to Part II. However, this chart also adds another layer of visualization by breaking down number of cars in each gear group into cars that have different number of cylinders (4, 6 and 8). This is highlighted by showcasing different colors to identify number of cars with different cylinders in each gear level. For example, there is 1 car shaded in dark blue, indicating number of cylinders = 4, for the gear level 3. This means that our of 15 cars that have 3 gears, only 1 has 4 cylinders, 2 have 6 cylinders and the reamining have 8 cylinders.

Objective: Draw a scatter plot showing the relationship between wt and mpg.

This scatter plot provides a visualization for change in the miles per gallon, depicted on the y-axis, as the car weight is changed on the x-axis. Scatter plots essentially provide a directional patter on change in the response variable (on y-axis) as the independent variable (on x-axis) is changed. Here, the graph illusrates that, on an average, the mileage (mpg) of a car decreases as the weight of car is increased.

Objective: Design a visualization of your choice using the data and write a brief summary about why you chose that visualization.

I have found that one of the most useful plots for visualization is the box plot, especially if we are looking at representating factor data types with multiple levels. Moreover, a box plot provides a measure of mean of response variable for each level of independent variable on the x-axis. It also represents the interquantile range of the data for each level of independent variable that provides a visual interpretation of the variance of data point in those levels. Addtionally, the box plot also shows the min and max ranges of the values for each level, effectively highlighting the outliers at a glance. We see this for cyclinder group 8 where one outlier has very low mpg value, which is displayed below the box.

Here, I have utilized a box plot to display the miles per gallon values for cars with different cylinder types. Based on the visualization, we can interpret that the average miles per gallon for a car is higher for lower number of cylinders, i.e. mpg decreases as the number of cylinders increases. We can also infer that there is high variability in the mpg values of cars with 4 cylinders as compared to cars with 6 or 8 cylinders. This can be inferred by the bigger interquantile range for number of cyliders group 4 vs that of group 6 and 8.

IMAGES

  1. Advanced Data Visualization Homework

    data visualization homework

  2. Advanced Data Visualization Homework

    data visualization homework

  3. Advanced Data Visualization Homework

    data visualization homework

  4. Advanced Data Visualization Homework

    data visualization homework

  5. Advanced Data Visualization Homework

    data visualization homework

  6. Data Visualization for Storytelling

    data visualization homework

VIDEO

  1. DATA VISUALIZATION AND IT'S TOOLS

  2. Data Visualization

  3. Data Visualization Lecture 3

  4. Data Visualization Lec 3

  5. Data Visualisation Introduction

  6. Learn Data Visualization and Information Design

COMMENTS

  1. Homework 11

    Homework 11 - Data Visualization. Weight: This assignment is worth 4% of your final grade. Purpose, Skills, & Knowledge: The purposes of this assignment are: To practice exploring and data frames in R using the dplyr library. Assessment: Each question indicates the % of the assignment grade, summing to 100%. The credit for each question will be ...

  2. Assignment 1: Visualization Design

    Assignment 1: Visualization Design. In this assignment, you will design a visualization for a small data set and provide a rigorous rationale for your design choices. You should in theory be ready to explain the contribution of every pixel in the display. You are free to use any graphics or charting tool you please - including drafting it by ...

  3. PDF Data Visualization

    Homework References. Table of Contents (Note: Click on hyperlinks to go to different parts of the slides.) About Matplotlib ... Con: Like Matplotlib, data visualization seems to be simpler than other tools. Matplotlib Seaborn Plotly Tableau Resources. Seaborn - Installation Installing Seaborn should also be straightforward. Sample code:

  4. Data Visualization Tutorials

    Discover what geospatial data analysis is, the different types of geospatial data, and how to analyze geospatial data using Power BI. Learn more about turning your data & information into insightful infographics & visualizations with our tutorials. Discover new ways to storytell, build dashboards, & more.

  5. IS445

    IS445 - Data Viz - ACG/ACU. This is the course website for Data Visualization, instructed by Jill Naiman ([email protected]). Below, you will find the materials for each week, ... How to submit homework with the TurnItIn framework. Extras, Lecture 2.

  6. Homework 3: Data Analysis

    hw3.py: The file for you to put solutions to Part 0, Part 1, and Part 2. You are required to add a main method that parses the provided dataset and calls all of the functions you are to write for this homework. hw3-written.txt: The file for you to put your answers to the questions in Part 3.

  7. Data Visualization: Definition, Benefits, and Examples

    Data visualization is the representation of information and data using charts, graphs, maps, and other visual tools. These visualizations allow us to easily understand any patterns, trends, or outliers in a data set. Data visualization also presents data to the general public or specific audiences without technical knowledge in an accessible ...

  8. Data Visualization: Exploring and Explaining with Data

    About This Product. DATA VISUALIZATION: Exploring and Explaining with Data is designed to introduce best practices in data visualization to undergraduate and graduate students. This is one of the first books on data visualization designed for college courses. The book contains material on effective design, choice of chart type, effective use of ...

  9. 11 Data Visualization Techniques for Every Use-Case with Examples

    The Power of Good Data Visualization. Data visualization involves the use of graphical representations of data, such as graphs, charts, and maps. Compared to descriptive statistics or tables, visuals provide a more effective way to analyze data, including identifying patterns, distributions, and correlations and spotting outliers in complex ...

  10. Data Visualization Class Work

    STAT 302: Data Visualization is a course offered by the Department of Statistics at the University of Washington. It covers topics such as exploratory data analysis, graphical principles, data manipulation, and interactive visualization. The course website provides syllabus, lecture notes, assignments, and projects for students who want to learn how to create effective and engaging ...

  11. What Is Data Visualization: Definition, Types, Tips, and Examples

    Data Visualization is a graphic representation of data that aims to communicate numerous heavy data in an efficient way that is easier to grasp and understand. In a way, data visualization is the mapping between the original data and graphic elements that determine how the attributes of these elements vary. ... Do Your Homework. Preparation is ...

  12. What Is Data Visualization? Definition & Examples

    Data visualization is the graphical representation of information and data. By using v isual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Additionally, it provides an excellent way for employees or business owners to present data to non ...

  13. Data Visualisation 101: Playbook for Attention-Grabbing Visuals

    Use the code below to change the template for all your Plotly visualisations. You can check what other built-in templates are available and even learn how to create your own custom template in the documentation.. import plotly.io as pio pio.templates.default = 'simple_white'. After changing the template to simple_white, all your graphs will be automatically lighter.

  14. Teaching Data Visualization: An Introduction

    Sarah Welsh uses a few online tools like Google Forms and RAW to get students started with data collection and experimental visualization in a two-day lesson plan and accompanying homework assignment. Students engage with some of the rhetorical implications of surveys for data collection and the varied arguments that visualizing data in ...

  15. STA 199

    Getting Started. Go to the sta199-f22-2 organization on GitHub. Click on the repo with the prefix hw-01. It contains the starter documents you need to complete the homework assignment. Clone the repo and start a new project in RStudio. See the Lab 0 instructions for details on cloning a repo and starting a new R project.

  16. Data Visualization Homework 1

    Data Visualization hw1 csci 552: data visualization name: maitreyi muthya homework q1. prove that the manhattan distance definition satisfies all three distance ... Data Visualization Homework 1. Data Visualization hw1. Course. Data Structures (CSCI 362) 5 Documents. Students shared 5 documents in this course. University Indiana University ...

  17. Clemson Data Visualization Lab

    Upload the new dataset given in this homework from Our World in Data repository into the Tableau Desktop. Now follow this procedure and save the results in each section: Create groups based on location and continent fields.. Visualize the weekly number of total cases and new cases.. Use Multiple measures in a single view that is created in the section 2, and add total deaths and new deaths to ...

  18. Data Visualization S23

    Data Visualization Project - you take some dataset and develop a data visualization of that dataset with communicative intent. ... Homework: Reproduce Minard's March on Moscow: 2022-03-01: Aesthetic Mappings; Rules of Thumb: Munzner ch. 6, Wilkinson ch. 10: 2022-03-8: Tabular Data, Network Data: Munzner ch. 7, 9, Wilkinson ch. 11-12:

  19. Data Visualization and Data Technologies

    Basic visualizations for different data types. Defining visualizations using the Grammar of Graphics and ggplot2; Understanding human perception and its impact on visualization. ... Homework and the Code of Academic Honesty. Homework submissions are governed by the Code of Academic Honesty.

  20. Best Online Data Visualization Tutors from Top Universities: Homework Help

    Test prep and homework help from private online Data Visualization tutors. Our online Data Visualization tutors offer personalized, one-on-one learning to help you improve your grades, build your confidence, and achieve your academic goals.

  21. Data Visualization for Kids

    Data Visualization for Kids. Julia Krolik June 8, 2021. It all started when my nine-year-old son brought home his grade four math homework. (We live in Canada and he attends a French immersion program where all courses are taught in French, including math). The exercise asked students to visualize the kinds of fruits students preferred in a ...

  22. Data visualization (pptx)

    Information-systems document from Texas Christian University, 7 pages, DATA VISUALIZATION Cassie Le Business Information Systems - 055 What is Data Visualization? • The practice of translating information into a visual context • Makes data easier for the human brain to understand and pull insights from Video Data Visualiz

  23. PDF 14.310x: Data Analysis for Social Scientists Introduction Unit Homework

    relationship between pollution and the distance to the Huai river had two different visualizations: (1) a map similar to the ones in Figure 2, (2) a two-dimensional plane of the data. The latter showed the degree to the north in the x-axis and the level of pollution in the y-axis. Suppose that we were trying to do a similar visualization here.

  24. TianxueHu/CSE6242_Data_Viz: CSE6242 Data Visualization Homework

    CSE 6242 Data Visualization Homeworks. CSE6242 Data Visualization Homework. Contribute to TianxueHu/CSE6242_Data_Viz development by creating an account on GitHub.

  25. Data Visualization Homework 11/07

    Data Visualization Homework 11/07. This is a simple graphical represenation of relationship between variables of mtcars dataset as part of the Data Visualization Class Homework. Below is a short summary of mtcars data set and type of it's variables. ## Summary of the data set summary (mtcars)