Friday, 13 November 2015

Masterclass 1: Data Journalism, new practical projects

Data Journalism: new practical projects


In this masterclass we offer a new range of data journalism projects, with step-by-step instructions to completing them.

The first is designed to replace the project found in module 13B4 of the print and ebook versions of Multimedia Journalism.

A replacement has become necessary because Many Eyes, the platform we used there to turn our data into a range of visualisations, has now been withdrawn by its makers, IBM.

The three steps of data journalism: gather, process, visualise

Here's a reprise of the approach to data journalism we take in Chapter 13 of MMJ.

The easiest way to grasp how to do data journalism is to think of what you do in more traditional practises of the journalism craft

You gather information, you sift it, pick out the significant and interesting bits, and present it to the audience in as interesting way as you can. In short, you:  

  • gather raw information, 
  • process or filter it, and 
  • shape or visualise it.

You do exactly the same when data is your source material, rather than a collection of quotes, documents and events.

So in this and all other practical demonstrations of doing data journalism we will follow three steps:

  • Gather or find data
  • Process or filter data
  • Visualise data

Gather or find data
Go to Open Data by Socrata, , which is an open data resource. You'll be prompted to open an account at Socrata at some point, if you don't already have one.

I've taken as an example a data set I found there on alcohol consumption per country from the World Health Organization (WHO). It offers a breakdown of per capita alcohol consumption among adults over 15 across 193 countries.

You can find that dataset here (you'll need to open an account at Socrata if you haven't already got one):

You'll see this is a very simple data set with only two columns of information, which makes it ideal as a first data journalism project

Process or Filter data

Filtering data involves two tasks:

·      Cleaning it up by removing any information that we do not want in our visualisation

·      Sorting the information in the data by, for example, adding subsections of the overall data.

We aren't going to do either with this piece of data, but if you want an idea of how you might filter data in Socrata, they have a useful video demonstration here:

Visualise data

We are going to use a platform called Silk to visualise this data.

So we need to export it from Socrata and upload it to Silk.

Under the Export options in Socrata, choose 'Export  as a CSV  for Excel' and, for ease, save it to your desktop.

Open Silk and click on Create a new Silk.

Name it.

You'll be invited to view a three minute video of how Silk works. It's worth pausing to take a look as it explains how silk is organised

Here's what it says in summary:

When you upload a spreadsheet to Silk, each row of your spreadsheet is a unit of data. With the alcohol consumption example we are working with, each line has the name of the country and the alcohol consumption in litres per individual over the age of 15 in that country

Silk turns each of those lines into what it calls data cards.
So with this example, when we upload it a data card will be created for each of the countries covered.

Your spreadsheet needs a row at the top which has the titles or headings that enable the software to organise your data.

This one has just two, location, and alcohol consumption per capita. If that line were missing for any reason, Silk or any other visualisation tool could not make sense of the data, so you'd need to add appropriate headings.

That's something you'd do at the Filtering data stage.

I could also have a column that grouped indiviual countries into the continents they are a part of. If I did, then Silk would organise these data cards into  groups, which would mean the information could be filtered and presented continent by continent

Your data is also converted by Silk into what it calls pages. You can add elements to those pages, each of which is given its own unique url.

So if you are writing an article and want to embed pages of data - subsets of the overall dataset - at particular points, you can do so.

Let's go ahead and upload our data into Silk.

Click to Proceed and choose the 'Upload spreadsheet' option. Then click and drag the spreadsheet you have saved to your desktop into this area of Silk.

Click 'start import'.

The data on your spreadsheet is being turned into data cards. When that has happened you can click to explore data cards.

'Explore' is where you create visualisations in Silk

Try them out. Think, in each case, how easy that particular visualisation makes the data to 'read'.

Ideally, we'd like the type of visualisation we finally choose to enable readers to see at a glance some salient facts. For example, which countries have the highest  per capita consumption, and which the lowest.

Some types of visualisation aren't much use. 'List' givers each country in alphabetical order, with the consumption. 'Grid' and 'Mosaic' don't add anything.

Groups is useful.

If you click on Group and then, under the 'Group by' option that appears, use 'Litres per capita', you get individual countries grouped under levels of consumption, which mean you can quickly see geographic and cultural patterns in the data.

Visualise on a map

Map sounds promising but you'll find you are prompted to add categories via dialogue boxes to organise the data. Silk gives suggestions.

Experiment with them.

You should find a set of markers added to the map, and when you click on them you get information on alcohol consumption, as in this example:

But that takes a lot of work by the reader, who has to click on a country pin to find out what the alcohol consumption there is, so doesn't help them all that much.

Use the 'Colour by' dialogue and you get a more useful picture.

Now, a coloured disc appears on each country, with a number on it. The number represented the consmption, rounded to the nearest litre, and the discs vary in size depending on the level of consumption in that country.

So, now we are beginning to get a visual representation of the data that helps readers interpret it at glance, rather than by wading through lists of figures.

Publish your visualisations

At any point I can publish the visualisation Silk has created for me, and share it in various ways, including via a link:

Here's that link:

Click on it and my visualisation will open up in a nicely presented Google map, with the map element presented effectively in context. This is one of the  individual pages Silk has created for me, so I could use it as part of the article I am writing.

From there I can pick up code to enable me to embed it in the article I am writing, if I'd like to:

If you do click to publish you'll then have to click on Explore again to get back into the data.

Run along the rest of the options.

Bars and columns give you an immediate comparison.

With columns you need to scroll across to reveal which country is represented in each stack.

See anything immediately interesting or surprising?

Which country would you have guessed, befroe looking at this data, had the highest alcohol intake?


Infact, according to his WHO data it is little Luxembourg.

Second comes Ireland.

In terms of giving an instant indication of what the data shows, this is an effective visualisation.

As you'd expect, Muslim countries show the lowest readings.

Silk limits you to 3,000 rows of data, and will reject any data set that exceeds that total when you try to upload it.

Further tuition on Silk and Socrata Open Data

Silk has a guide: How to use silk for journalism, here:

Silk's YouTube channel is here:

Socrata's guide to its Open Data initiative


No comments: