Recently I downloaded all of the IMDb’s ratings with a view of creating a visualisation of Doctor Who ratings across the entire 800 episodes (and fifty years) of the show. Being more of an original series fan I was curious how the older episodes rated against the newer ones. The results are quite interesting: check out the interactive visualisation or explore the data in the far-less-exciting tables below.
Archive for the ‘Data visualisation’ Category
While working on a visualisation of Commonwealth war dead during the First World War, one important design decision I made was to restrict the dataset to the day the war ended: Armistice Day on 11th November, 1918.
Canadians celebrate Armistice in Mons, 1918 (National Library of Scotland)
Obviously many more soldiers died from wounds received during the war after this date — not to mention those from the flu pandemic that started before the war even ended and finished in mid-1919.
Steve Douglas, director of the Maple Leaf Legacy Project, also pointed this out to me a few days after the project went live.
The Commonwealth War Graves Commission (where I obtained the data) has records for 75,501 deaths between 12th November 1918 and 31st August 1921 (with 31 August being the end of “designated war years” for the First World War). While this is still a hugely significant amount of deaths, it pales to the amount of deaths incurred during the active war years.
The main reason I decided to omit deaths after 11th November 1918:
- 11th November 1918 is a powerful date. Telling the story of the war felt stronger ending on this day if only because those who died where possibly only hours or even minutes away from surviving the war
- Explaining why the dataset extended to 31st August 1921 complicated the visualisation. The more explaining required for a visualisation, sadly the less effective it becomes
- Deaths per day on average after Armistice dropped from around 627 to 86 per day. In comparison to the dead per day from battles like Loos, The Somme and Passchendaele visualising this level of data would be difficult given the sheer difference in scale of numbers
While I felt this data didn’t fit into the main war dead project, it definitely warrants it’s own visualisation, which I’ve put together below:
What’s most interesting about this visualisation is how to correlates almost perfectly with the UK-wide flu deaths from 1918-1919. During the war any pattern is hard to spot as the deaths are obviously far higher due to combat-related casualties. But post Armistice almost immediately the pattern is obvious.
Moving beyond the flu pandemic, another issue with CWGC’s dataset becomes obvious: these aren’t just war deaths, they are all services deaths between 4th August 1914 to 31st August 1921. Incidents such as the loss of HMS Iolaire, the Iraqi revolt against the British, the Waziristan campaign, loss of HMS K5 and the R38 airship accident demonstrate this with their spikes in deaths after the war.
Even with the data during the war, many services deaths were recorded during the First World War when the servicemen did not die as a result of the war. Probably the best example of this is the oldest death in the war — 85-year old George William Valentine Clements — who fought for the British Army in the Crimean War but most likely spent the entire First World War retired in his Norfolk home until he passed away in March, 1916.
Going back to original visualisation of the war dead, I was careful to use language that reflected what the CWGC dataset really is: a record of all deaths recorded during “designated war” years — but with a decision to limit the dataset to the “official” war dates of 4th August 1914 to 11th November 1918 for the reasons I’ve described above.
It’s regrettable (and frustrating) that it’s so difficult to include those who died after Armistice day into this sort of visualisation. But thinking about the war on a higher level, even Commonwealth war dead only make up a tiny fraction of estimated 16 million deaths of the conflict world wide.
However, I still genuinely hope the visalusation communicates the loss and tragedy in a way that transcends just numbers on paper.
Since the start of this year I’ve spent a lot of time looking at how gamification and game mechanics can create more engaging user experiences.
Many of these gaming concepts work also wonderfully for data visualisation and interactive infographics (and what other else you want to call them depending on your specific vernacular).
Consider your users as players
All great games have story lines. The same is true for data visualisations: if you’re not telling a story, then what exactly are you visualising? Just as players explore a game and its narrative, think about the users of your data visualisations in the same context.
The Guardian uses the size of your social network on Facebook to demonstrate how vast the NSA surveillance net is and illustrates these numbers with familiar concepts: the size of a train, capacity of famous landmarks and populations of actual countries.
The BBC’s How Big Really? is another great example of this: simply enter your location anywhere in the world and it will overlay significant geographic data over your local area to give you a great insight into the size of certain events.
Cascading information theory
Games reveal chunks of plot to a player slowly — much like any narrative (books, films etc.). Often this is done after completing tasks in the game: kill that bad guy, find a clue or drive to a specific location. Tell your story in parts for maximum impact.
Last year for the Olympics I was designing a data visualisation for the BBC allowing users to see how they compared to Olympians in regard to height and weight. I spent a lot of time working on the actual visualisation of the data and realised after user testing that I’d really neglected the introduction to the data. Putting a scattergraph in front of users (especially those not into data or numbers) can be really off-putting.
I took a step back and thought about the story I was trying to tell: that Olympians come in all shapes and sizes. So I took the tallest and lightest Olympians and placed them on the introduction. It was a great way to begin telling this particular story. While users could enter their height and weight and be plotted on the scattergraph amongst the Olympians, they could also just click on either the tallest or shortest athlete and be taken into the dataset that way.
Here Is Today also does a wonderful (and far better) job at cascading information to the user as they explore the visualisation.
Feedback loops and interaction design
Arguably the core of all games (and pretty much anything interactive) are built upon feedback loops. You hit fire on a controller and a rocket will launch on screen along with an audible “whoooooosh” sound effect. This cause-and-effect cycle is key to both gaming and interaction design in general. It’s one of many reasons why Candy Crush is so addictive: it’s a constant series of feedback loops accompanied by stimulating colour and sounds.
All the examples listed above feature a lot of polished interaction design (and feedback loops). The web abounds with amazing interaction design, but for a fantastic example of great interaction design/feedback loops on data visualisations, have a look at Periscopic’s Inequality Is (which is also a great example of all the elements discussed here).
In conclusion, gamification and game mechanics can definitely help create wonderful and engaging data visualisations. But the more well-known elements of gamification such as badges and level ups don’t really work here: we need to look at what elements of gaming can help us tell the story that lies at the heart of the data.
For the past few months I’ve been playing around with Commonwealth War Graves Commission data from the First World War. The loss of life during the First World War was massive, as we all know: but I’ve seen little in the way of visualisation of the true cost of all these figures.
So here’s my attempt at trying to visualise the massive death toll from the First World War: the Commonwealth World War One Data Timeline. Limited by the data set this only looks at the British, Australian, Canadian, Indian, New Zealand and South African dead during the war: it doesn’t account for the millions lost by Commonwealth Allies (such as France and the US) or the Entente Powers.
More importantly, it doesn’t show the amount of deaths after armistice or the death toll from the Spanish Flu in 1919.
Regardless, I find it an interesting yet tragic way to explore the First World War. I plan to write up the full story of how I developed the timeline, but for the moment I’m keen to do some iteration and polishing on the actual timeline. So please check out my First World War data visualisation — any feedback gratefully received.
Yesterday I saw an infographic from Greenpeace about the cost of cleaning up Fukushima. On to Twitter it went…
Dubious and dare I say pointless 'infographic' from Greenpeace. No idea what this is telling me? pic.twitter.com/Es9LEZdf3h
— James Offer (@joffley) August 2, 2013
And Twitter’s response?
@joffley mmm. suppose to give you a view of the size of the cost… I don't know.
— Marcel du Preez (@marceldupreez) August 2, 2013
@joffley for one thing, that cleaning up after nuclear energy costs as much as supplying clean infrastructure to replace nuclear.
— Antony Day (@antday) August 2, 2013
That sounds about right — the message Greenpeace wants to communicate is that cleaning up Fukushima is immensely expensive and the same money could be used to provide clean energy. Bear that in mind and look at the infographic again:
I can kinda see it — but a good infographic / data visualisation should not leave me searching for the meaning. The main problem is the use of a treemap. For starters there’s not enough data or variation in size of data for the treemap to be useful (check out the Billion Dollar Gram for a far better example of a treemap that works). Secondly, the two key pieces of data — cost of the clean up and the cost of green energy — couldn’t be further away.
Thirdly, why be coy with the story here? Tell me your point, then show me your visualisation.
Here’s my version:
Better? Bar charts might not the be sexiest approach, but they work.
This graphic from the latest RSPB magazine has an alarming headline: Worrying results from world’s biggest birdwatch.
The copy goes on to explain the growing decline in birds in the UK:
- Starlings: down by 16%
- House sparrows: down by 17%
- Bullfinches: down by 20%
- Dunnocks: down by 13%
The graphic below this communicates the number of bird sightings perfectly well, but what it doesn’t do is in anyway illustrate the tone of urgency explained in the copy. Furthermore, there are some other issues with this graphic as a data visualisation:
- The blue bars behind the bird silhouettes don’t really function as a bar chart either as the baseline is on top aligned (instead of bottom aligned)
- the bird silhouettes themselves skew the visual comparison of these bars (I’m looking at you wood pigeon in particular)
- While the birds distract from the data, however, they do show a good size comparison between each species
To really illustrate these results, we have to compare them to last year — as was done in the introductory copy. Along with showing that comparison, let’s add in the RSPB’s conservation status for further reference and some illustrations. We lose the bird size comparison, but I think we gain a lot more…
Now, I don’t know about you, but I am really worried about the state of bird numbers in the UK. But the good news that has suddenly emerged from this data? The nimble long tail-tit appears to be thriving.
One last thing this graphic would need: the visualisations for bullfinches, dunnocks, siskins, fieldfares and jays to demonstrate their growth.
It’s been almost a week now since the inaugural UX Scotland up in sunny Edinburgh. Here’s my round-up of what I saw and what themes came up during the two days of talks and discussions.
Overall I think the most interesting theme I took from the conference was that of context. A lot of this started on day one after a goldfish discussion on the future of broadcasting and was cemented by Giles Colborne’s keynote on day two which looked exactly at context and what it means for user experience.
Context is a great challenge for user experience designers: getting the context right for a user is a wonderful experience. But getting context wrong and it the experience is awful. Getting context right is the real challenge.
After a quick intro from the organisers (Software Acumen) Jeff Gothelf kicked off the talks with the first keynote: “Better Product Definition with Lean UX & Design Thinking“. This was a great reminder of how products can (and will) fail if you simply make assumptions about your users. The demise of Plancast is a stark reminder of how not really considering your users can lead to disaster.
I was lucky enough to be talking next: my debut presentation of Play & Engage: Practical Ways to Gamify Your Content. (There’s also a fairly comprehensive blog post of my key points available too). Unfortunately on at the same time was Graham Odds talking about data visualisation, which I really wanted to see — you should check out his slides if only to admire some masterful and beautiful CSS3.
Next up: Martin Belam took a look at “Designing ‘The Bottom Half of the Internet“. He took us through the love-it-or-hate-it world of comments and demonstrated some truly staggering douchebaggery in the form of comments left on Holly Brockwell’s blog after her open letter to Hyundai regarding their awful ‘suicide ad’. A key lesson for anyone involved in moderation: comment often. It seems most commenters are not unlike five year old children (are you really surprised?) and some grown up presence seems to help them behave.
Then I sat in for a double-feature of internationalisation and user experience: Chui-Chui Tan gave us some great insight into how different cultures use technology with Your Mobile Experience Is Not Theirs. Chris Rourke followed this up with Cross Cultural UX Research – Best Practices for International Insights that gave some valuable insight into working internationally (and user testing remotely to boot).
After this was a real highlight: the goldfish discussion on broadcasting in a multi-device world. Rhys Nealon from STV kicked off the discussion with several industry figures — and it soon went from being a panel discussion into a general group discussion which was fantastic. Pretty much everyone attending contributed: it’s amazing how everyone has an opinion on consuming television content.
But the overriding challenge in this multidevice world soon emerged as context. How can Netflix (or any other product) differentiate between me watching Games of Thrones and then my children watching Sesame Street — without a myriad of different logins? How can we balance discovery with curation? Not many answers from this discussion but some very exciting questions.
To end the day Sam Nixon from RBS took us through a look at the future of money and specifically digital money services. How can we make online banking more useful? He provided some great insights into how useless breaking down your ‘monthly’ spend is and instead proposed easier and smarter payment systems (such as Barclay’s Pingit) will be the real future of digital money (along with a few mentions of — of course — Bitcoin).
That’s was the end of day one: time to head over to the Voodoo Rooms for some hard earned drinks (and some very fine curry).
As I’ve already touched on, Giles Colborne added nicely to the context theme with his in-depth talk looking at all facets of context and how it affects user experience.
Following this was an immensely fun and very useful look at “How to Make Your First UX Comic or Storyboard” with Bonny Colville-Hyde. I’ve been sketching here and there for my whole life but this certainly gave me some inspiration to take it much further.
Look! I made a comic!
After another wonderful lunch over looking the Salisbury Crags, Ian Fenn took us through his experience in “Getting UX Done” which had a nice element of humour in amongst practical advice on dealing with all manner of challenges. Immediately after Mike Atherton took a look at “Brand-Driven Design“. A glass of whiskey and some cigar smoke would’ve nicely rounded off his look at advertising from the 60s and how brand is a fundamental part of any experience.
And thus concluded two days of diverse and very interesting look into UX. Fantastic talks, great venue and awesome people really made it worth the trip up (not that I ever need much of an excuse to go to Edinburgh). It’ll be great to see what UX Scotland 2014 has to offer next year.
While I’ve worked under various job titles over the years, much of my work has consisted of creating and maintaining the best possible experiences related to content-rich websites across a broad-range of areas — including arts, news, sport and even transport.
A lot of this experience design has to do with navigation and information architecture. But beyond this, what happens to a user once they’ve found their content? A good experience shouldn’t end there. But even if your content is superbly written, edited and laid-out, sometimes there’s a limit to the impact you can have just with the written word. Sometimes we can do more with far less – and this is where we can look to the world of gaming for assistance.
Yes, I’m suggesting we can gamify content for the better. But perhaps not in the way you’re thinking (and possibly dreading).
Bump’s calculations are dependant on this hunk of hardware: the HP ProLiant DL580 G7 — currently the ‘highest-density system’ commercially available. You can read the exact calculations in his article, but he puts the estimate storage space of these behemoths at 21 terabytes. Here’s a size and storage comparison with Blu Ray’s and DVD’s:
Moving forward, Bump proposes the NSA would need 5,600 boxes to store a single year’s worth of PRISM data (9.7-petabytes: a modest 406,847 25GB Blu Rays). As a data visualisation exercise, this is where things get really interesting: that’s a huge amount of boxes. How big is that in comparison to other “big” things?
Let’s start with the simplest comparison: let’s forget gravity and a host of other physical restrictions and stack these boxes one on top of each other. 5,600 x 17.6 centimetres (6.94 “) is a huge 861 metres (around 2,825 feet). Compare that below to some of the world’s tallest and iconic buildings:
That’s an impressive tower of data. But not particularly practical. Bump mentions stacking the boxes:
If you stack those boxes 5 feet high (eight boxes), we’re talking about a total physical footprint of 2,410 square feet.
2,410 square feet — or (223.9 square metres) — is another big big number. But let’s forget stacking them and consider how much space you’d need if they were laid flat (no stacking). A single HP ProLiant DL580 has an area of 3.63 square feet (0.338 square metres). Now for a year, based on 5,600 boxes, that gives us 20,380 square feet (1,893.36 square metres). That’s around 26 soccer pitches.
Let’s compare it to some actual landmarks, overlaying the area required for PRISM layed flat:
New York / Times Square
London / Trafalgar Square
Paris / The Louvre
Now we’ve laid the drives into a square shape, they are far more compact: which shows that “stacking” them up in the air gives quite a skewed representation of how much space we’re actually talking.
This lot of data in a not lot of space — of course with zero provision for cooling systems, electronics etc. So not too worry, the NSA has plenty of space: this year in Utah, the NSA will complete a new data centre with an area of 1 million square feet (92,903 square metres).
Rest assured, the NSA will be stacking these servers — and will have PLENTY of space for operations like PRISM.
(I hope these calculations all work — any errors please let me know!)
Update: (12 June) I made some errors sorting out my area calculations that I’ve updated above. Thanks very much to Dr. Bob Campbell for helping me correct this!
Here are the two compared:
The similarities, of course, are not surprising.
Without comparing the datasets directly, it’s hard to find any definitive insights, but a few things that look interesting:
- Social networks with the Hawaiian islands are stronger than the flights (it’s a long flight I guess, but the beaches are good right?)
- Chinese mainland connections through Facebook are not as pronounced as the air traffic. But Facebook isn’t as big in China as it is in the West
- West Africa has stronger connections through Facebook than through flight paths (again, not surprising)
- Western Europe is heavily interconnected in both maps, but moving east this reduces in density… but Moscow is a large hub in both maps
- Australia and New Zealand prefer to keep in touch through Facebook rather than flying across the Tasman
I’d love to compare these datasets together along with similar stats from Twitter, Baidu/Weibo in China and VK in Russia/Eastern Europe.
Can anyone else see any other interesting similarities (or differences) in here?