Which of the following are resistant statistical measures? 1. Mean 2. Median 3. Mode 4. Range 5. Standard Deviation
For the variable number of parking tickets in the past year would you expect the distribution to be: 1. Bell-shaped 2. Skewed Left 3. Skewed Right
Example 1. According to a recent report from the US National Center for Health Statistics, females between 25 and 34 years of age have a bell-shaped distribution for height with a mean of 65 inches and standard deviation of 3.5 inches. Describe an interval within which about 95% of the heights fall. Draw a labeled picture of a bell curve that illustrates this. 1. 60 in to 65 in 2. 61.5 in to 68.5 in 3. 58 in to 72 in 4. 65 in to 68.5 in 5. 54.5 in to 75.5 in
Measures of Position Median Maximum and Minimum Quartiles Percentiles Z-scores
Percentiles The pth percentile is a value such that p percent of the observations fall below that value. Median = 50 th percentile Quartiles Q1 = 25 th percentile Med = 50 th percentile Q3 = 75 th percentile
Finding Quartiles Arrange the data in order. Find the median. Draw a line through the median. Find the median of the lower half of the data. This is Q1. Find the median of the upper half of the data. This is Q3. Use 1-Var Stats on the TI-83/84.
Example 2 The following table gives the losses in the principal battles of the Civil War. The figures are the total for killed, wounded, and missing, as given in Phisterer's Official Record: Battle Union Nashville 2,140 Franklin 2,326 Bull Run 2,952 Atlanta 3,641 Perryville 4,348 Chattanooga 5,616 Seven Pines and Fair Oaks 5,739 Second Bull Run 7,800 Murfreesboro 11,578 Battle Union Fredericksburg 12,353 Antietam 12,469 Shiloh 13,573 Seven Days Battles 15,249 Chickamauga 15,851 Chancellorsville 16,030 Gettysburg Campaign 23,186 Spotsylvania 26,461 Wilderness 37,737
Example (continued) As there are eighteen observations, we average the ninth and tenth observations to find the median. Q1 is the median of the first half of the data. As there are nine items, this is the fifth entry. Q3 is the median of the upper half of the data. As there are nine items, this is the 14 th entry.
Example 2 Battle Union Nashville 2,140 Franklin 2,326 Bull Run 2,952 Atlanta 3,641 Perryville 4,348 Chattanooga 5,616 Seven Pines and Fair Oaks 5,739 Second Bull Run 7,800 Murfreesboro 11,578 Battle Union Fredericksburg 12,353 Antietam 12,469 Shiloh 13,573 Seven Days Battles 15,249 Chickamauga 15,851 Chancellorsville 16,030 Gettysburg Campaign 23,186 Spotsylvania 26,461 Wilderness 37,737
Five-Number Summary Maximum Q3 Median Q1 Minimum
Measuring the Spread: Interquartile Range The interquartile range (denoted IQR) is the difference between Q3 and Q1. This is more resistant than the range. Describes the range of the middle half of the data. Used to determine outliers.
Example 2 IQR = 15851 4348 = 11503 Battle Union Nashville 2,140 Franklin 2,326 Bull Run 2,952 Atlanta 3,641 Perryville 4,348 Chattanooga 5,616 Seven Pines and Fair Oaks 5,739 Second Bull Run 7,800 Murfreesboro 11,578 Battle Union Fredericksburg 12,353 Antietam 12,469 Shiloh 13,573 Seven Days Battles 15,249 Chickamauga 15,851 Chancellorsville 16,030 Gettysburg Campaign 23,186 Spotsylvania 26,461 Wilderness 37,737
Detecting Outliers: 1.5 IQR Rule An observation is considered a potential outlier if it falls more than 1.5 IQR above Q3 or below Q1. It needs to be truly separated from the data to be considered a definite outlier.
Example 2 1.5*IQR = 1.5* 11503=17254.5 Q1 17254.5 = -12906.5 --- No lower end outliers Q3 + 17254.5 = 33105.5 --- The casualties at Wilderness are a potential outlier in the data. Given that there is a definite gap between Spotsylvania and Wilderness, it is a definite outlier. Battle Union Nashville 2,140 Franklin 2,326 Bull Run 2,952 Atlanta 3,641 Perryville 4,348 Chattanooga 5,616 Seven Pines and Fair Oaks 5,739 Second Bull Run 7,800 Murfreesboro 11,578 Battle Union Fredericksburg 12,353 Antietam 12,469 Shiloh 13,573 Seven Days Battles 15,249 Chickamauga 15,851 Chancellorsville 16,030 Gettysburg Campaign 23,186 Spotsylvania 26,461 Wilderness 37,737
Box Plot (Box and Whiskers): Graphing a Five-Number Summary Construct an appropriate scale for your data. Draw a box extending from Q1 to Q3. Draw a line down the middle of the box at the Median. Draw a whisker on the lower end to the minimum. Draw a whisker on the upper end to the maximum.
Modified Box Plot All outliers are replaced by a symbol of choice and the whiskers are drawn to the next closest observation. Graphing on the TI-83/84.
Example 2 0 10000 20000 Casualties 30000 40000
Box Plots vs. Histograms Distribution shape is clearer on a histogram. Box Plots are useful for comparisons of quantitative variables from different categories. Box Plots highlight the key information from the Five-Number Summary.
Example 3, revisited Construct side by side box plots of the GPA data from class, separated by gender GPA for Gender = Female 2 : 0044 2 : 566788 3 : 011234444 3 : 88 4 : 00 GPA for Gender = Male 2 : 0014 2 : 577899 3 : 1224 3 : 68
Z-scores A z-score is a measure of how many standard deviations an observation is away from the mean. z = x Negative z-scores represent scores below the mean. Positive z-scores represent scores above the mean. s x
Z-scores By the empirical rule, 68% of the observations from a bell-shaped distribution have a z-score of 1 or less. z-scores allow you to compare observations from two different groups.
Example 4 NBA star Kobe Bryant is 79 inches tall and former WNBA basketball player Rebecca Lobo is 76 in. tall. Bryant is obviously taller than Lobo but who is relatively taller when compared to heights within their own gender? Suppose men have heights with a mean of 69 inches and a standard deviation of 2.8 inches. Suppose women have heights with a mean of 63.6 inches and a standard deviation of 2.5 inches.
Example 4 Men have heights with a mean of 69 inches and a standard deviation of 2.8 inches; women have heights with a mean of 63.6 inches and a standard deviation of 2.5 inches. Kobe z = (79 69)/2.8 = 3.57 Rebecca z = (76 63.6)/2.5 = 4.96 Rebecca is relatively taller.
Detecting Outliers For a bell-shaped distribution, we can look for outliers using the 3-standard deviation rule. Observations which are potential outliers have a z-score above 3 or below -3. You need to observe that they are truly separate from the other observations in order to confirm that they are outliers.
SECTION 2.6 HOW ARE DESCRIPTIVE SUMMARIES MISUSED?
Reporting Numerical Summaries which Misrepresent the Data Using the mean and standard deviation when outliers severely skew the results. Using the median for discrete data sets over a small set of values. Using the mean for multi-modal graphs.
Graphical Techniques that Misrepresent the Data Using poorly drawn graphs. Using poorly scaled graphs. Comparing two incomparable quantities. Using totals where percents would be more appropriate.
Guidelines for Effective Graphs Label both axes and specify the units. Provide a heading to make clear what the graph is intended to portray. To compare relative sizes accurately start the vertical axis at 0. Be wary of using shapes rather than bars to represent data. Be careful when presenting two groups on the same set of axes.
Examples How to Lie and Cheat with Statistics Good Math, Bad Math Misusing Statistics Examples of Bad Graphs Gallery of Data Visualization How to Construct Bad Charts and Graphs