My charge is to answer the question: how do you tell a story through the use of data?
Let’s start with Harper’s Magazine, which started in 1850, and in 1984 introduced the Harper’s Index (and its new executive editor, Michael Pollan), a popular one page list of selected data. Here are selected items from the current November, 2015 issue:
“Rate of retraction of scientific papers has increased in the past four decades: 10
Portion of retractions that stem from fraud, plagiarism, or duplicate publications : 2/3
Percentage of Americans aged 18 to 34 who identify as millennials : 40
Who identify as baby boomers : 8
As members of the greatest generation : 8
Portion of U.S. college freshmen who rate themselves above average in academic aptitude: 7/10
Percentage by which the number of women graduating from college is expected to exceed the number of men in 2025: 47
Percentage of U.S. mothers who have stopped working or switched to less challenging jobs in order to care for children: 62
Of U.S. fathers: 36
Portion of U.S. children aged six to eight who watch YouTube videos every day: 3/4
Percentage of U.S. adults with disabilities who lived in poverty when the Americans with Disabilities Act became law: 27
Who do today: 32
Percentage of married U.S. women who live in poverty: 7
Of unmarried U.S. women who do: 23
Amount that Carly Fiorina’s tenure as CEO of Hewlett-Packard cost the company’s shareholders: $55,200,000,000”
I will offer six bits of advice:
1. Don't use data as decoration
2. A little bit of data goes a long way
3. Have the story drive the numbers, not the other way around.
4. Use Numbers to play with the reader’s Expectations
5. Surprise not just with numbers, but with the variables you pick
6. Numbers can do a lot, but they can’t do everything
1. Don't use data as decoration
Use numbers as evidence.
Use numbers as illustrations.
Use numbers to get the reader’s attention (the Harper’s index).
Use numbers to reinforce contrasts and disparities.
Use number to give voice and attention to the invisible, the marginalized, the neglected. To be counted is often to matter.
But don’t use numbers simply to decorate your story.
Don’t use them as a crutch, because you are insecure that your narrative alone is not working.
Don’t have them just sit there inserted, awkwardly on the page, passively taking up space and ink, like they are a shy person at a party, not talking to anyone.
Only put numbers in your story if they carry weight, if they do some narrative work, if they are an essential part of your story.
2. A little bit of data goes a long way
Avoid the temptation to do the following:
Overwhelm the reader with lots of data just because the data is available and you want to impress — as if data alone give your writing gravitas. Quantity is not always quality.
A little bit of selected data on a city will show that you know the place and you bothered to look at the data. But don’t dump the whole census file into your article.
Don’t feel compelled to turn all data into tables and charts (especially the woefully overused pie chart). Sometimes a few bits of data in the text will do best.
Your task is to convert large, complex data sets into simple conclusions — this is not the same as dumbing down or hiding results. This is focusing on what matters most. (You can always put the details of data and methods in an appendix.) That Harper’s Index is a monthly one page punch of highly distilled and suggestive data. They are simple numbers suggesting deeper, more complex truths.
Good data presentations encourage the audience to see patterns, to compare, and to put in context. So give them enough to do this, but not so much to drown the audience in a sea of unrelated numbers.
3. Have the story drive the numbers, not the other way around
Don’t just be a passive tour guide through the forest of your data, saying “Table 1 shows this, Table 2 shows that, etc.” Develop the story line and use the data to highlight, illustrate, provide evidence for your storyline. This does not mean to write your conclusion first and then find the data to support your story — though that is unfortunately often done. [I had a grad school friend who worked for an unnamed economics professor who told the student: here is my draft article: find the data to support my conclusions.]
Instead, be open to the data and to surprises. If you are a detective then the data is the evidence and the clues (be it data you found from the census, collected from surveys, from GIS analysis, from remote sensors, etc.). But then, like Sherlock Holmes or Lisbeth Salander, you are in the driver’s seat and have to craft a coherent story, considering and rejecting rival explanations, and identifying the central characters.
4. Use numbers to play with the reader’s expectations
Take advantage of the audience’s expectations: use data to both confirm their knowledge and expectations (e.g., overall living costs in San Francisco are much higher than in Detroit, and then also surprise and challenge expectations (As a percent of income, San Francisco is still a more affordable place to live than Detroit. HUD’s Location affordability index: what percent of family income spent on housing and transportation? San Francisco: 43%; Detroit: 49%.) Good writing often navigates that borderline between affirming what the reader already knows and then pushing the reader into new and strange (and often uncomfortable) territory. If your data simply confirms and replicates what we already know, then we ask: “so what”? If the data is wholly strange and without context or connections to the world we know, then we have no entry to it. So both affirm and surprise. Just don’t distort or omit data because it conflicts with your storyline.
5. Surprise not just with numbers, but with the variables you pick:
Don't just choose the numbers, but also the variables. A creative way to use data is not just to look for interesting or unexpected values of conventional variables (e.g., poverty, population density, housing starts, income) but unexpected but revealing variables such as
Western Union Money Transfers as an indicator of immigrant communities;
Cardboard box production and sales as a measure of retail sales and people moving;
And one I have been pushing to count for years: the number of trick-or-treaters visiting a block as a measure of the walkability of that neighborhood;
Look for variables outside what is usually counted. Again, make the invisible visible. And if you can’t count something directly, count it indirectly, through proxies, be it money transfers, cardboard boxes or jolly ranchers passed out.
6. Finally, numbers can do a lot, but they can’t do everything
Not everything can be measured and converted to data. Some argue that if you can’t measure it, it doesn’t exist. But new phenomena may be hard to count, things fall between the cracks of different categories, phenomena that are currently too small to measure (but one day will be important). Sometimes this means that no secondary data is available and you have to collect the numbers on your own. But often quantitative data is just not the right format. Sometimes you need narration, maps and design to tell a story.