Table of Contents

Chapter 3 - Displaying Categorical Data


Monday, 1 January 0001
3-minute read
446 words

Categorical data, also known as qualitative data, is data that can be divided into categories or qualities

Vocabulary

  • frequency table - a table that shows how frequent a specific category is
  • proportion - describes the quantity of the frequency
  • relative frequency distribution - what proportion did something happen
  • distribution - how the data is spread out, usually how it is spread out graphically
  • area principle -
  • contingency table - a table that displays data involving several independent variables
  • marginal distribution - a distribution that takes in the totals of contingency data
  • conditional distribution - a distribution that takes in a condition (e.g. given the value of an independent variable)
  • independence of variables - using probability to mathematically prove if two variables are correlated or have a specific relationship

Bar Graphs and Histograms

Bar graphs and histograms require a specific and descriptive title.

Bar graphs can be segmented into different qualities or categories while also being aggregated at the same time. The bars must have space between them to quality as a bar graph.

A histogram's independent variables are separated by quantitative intervals rather than categories. Histograms can not have space between the bars unlike bar graphs.

Simpson's Paradox

Trevor and Matt are competing to be the starting quarterback of a football team. During practice, the coaches totaled up the data and found this:

SPSPCLPLPC
Trevor74514711
Matt16197201

Example 1:

The CDC lists causes of death in the United States during 1999.

Cause of DeathPercent
Heart Disease30.3
Cancer23.0
Circulatory8.4
Respiratory7.9
Accidents4.1
Complement26.3
  1. Is it reasonable to conclude that heart or respiratory diseases were the cause of approximately 38% of U.S. deaths in 1999?

Yes, because the percentages add up, and the categories and title were relevant.

  1. What percent of deaths were from causes not listed here?

Since the causes of death are a proportion that add up to 100%, the complement would be the percentage not listed.

Marginal Distribution

WhiteMinorityTotal
Four year19844242
Two year36642
Military415
Employment14317
Other16319
Total26857325
  1. What percent of the graduates are white? $P(w) = \frac{268}{325} = 82.4\%$
  2. What percent of the graduates are planning to attend a 2-yr college? $P(2) = \frac{42}{325} = 12.9\%$
  3. What percent of the graduates are white AND planning to attend a 2-yr college? $P(w \cap 2) = 11\%$
  4. What percent of the white graduates are planning to attend a 2-yr college? $P(2 | w) = 13.4\%$
  5. What percent of the graduates planning to attend a 2-yr colege are white? $P(w|2) = 85.7\%$