child, nine families have two children, and eleven families have six children. The average number of children per family is three, because ninety (the total number of children) gets divided by thirty (the total number of families).
But let’s look at the average number of siblings. The mistake people make is thinking that if the average family has three children, then each child must have two siblings on average. But in the one-child families, each of the six children has zero siblings. In the two-child families, each of the eighteen children has one sibling. In the six-child families each of the sixty-six children has five siblings. Among the 90 children, there are 348 siblings. So although the average child comes from a family with three children, there are 348 siblings divided among 90 children, or an average of nearly four siblings per child.
Families
# Children/
Family
Total #
Children
Siblings
4
0
0
0
6
1
6
0
9
2
18
18
11
6
66
330
Totals
30
90
348
Average children per family: 3.0
Average siblings per child: 3.9
Consider now college size. There are many very large colleges in the United States (such as Ohio State and Arizona State) with student enrollment of more than 50,000. There are also many small colleges, with student enrollment under 3,000 (such as Kenyon College and Williams College). If we count up schools , we might find that the average-sized college has 10,000 students. But if we count up students, we’ll find that the average student goes to a college with greater than 30,000 students. This is because, when counting students, we’ll get many more data points from the large schools. Similarly, the average person doesn’t live in the average city, and the average golfer doesn’t shoot the average round (the total strokes over eighteen holes).
These examples involve a shift of baseline, or denominator. Consider another involving the kind of skewed distribution we looked at earlier with child mortality: Theaverage investor does not earn the average return. In one study, the average return on a $100 investment held for thirty years was $760, or 7 percent per year. But 9 percent of the investors lost money, and a whopping 69 percent failed to reach the average return. This is because the average was skewed by a few people who made much greater than the average—in the figure below, the mean is pulled to the right by those lucky investors who made a fortune.
Payoff outcomes for return on a $100 investment over thirty years. Note that most people make less than the mean return, and a lucky few make more than five times the mean return.
A XIS S HENANIGANS
The human brain did not evolve to process large amounts of numerical data presented as text; instead, our eyes look for patterns in data that are visually displayed. The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians.
There are many ways that graphs can be used to manipulate, distort, and misrepresent data. The careful consumer of information will avoid being drawn in by them.
Unlabeled Axes
The most fundamental way to lie with a statistical graph is to not label the axes. If your axes aren’t labeled, you can draw or plot anything you want! Here is an example from aposter presented at a conference by a student researcher, which looked like this (I’ve redrawn it here):
What does all that mean? From the text on the poster itself (though not on this graph), we know that the