Dan Paul Smith

Interface and visualisation developer.

Currently UI Lead at Stratified Medical. Previously freelanced as an interface and visualisation developer for linked data & open data projects. Also currently enrolled on a part-time MRes in Spatial Data Science & Visualisation at CASA, UCL.

If you want to get in touch, feel free to contact using the form below or on Twitter.

Summary of freelance work

Contractor for Technology Strategy Board: Oversaw the front-end for LinkedGov. Developed the LinkedGov extension for Google Refine along with a number of other interfaces that helped non-technical users interact with data.

Contractor for The Stationery Office: Worked with teams at The National Archives and data.gov.uk to design and build interactive organisation trees powered by linked and open data.

Other projects have included configuring linked data APIs, modelling ontologies, rapid prototyping of web apps & visualisations and being part of various workshops & panels.

I've spoken and run workshops at SemTechBiz, OpenTech and UKGovCamp - specifically on user interaction and data consumption for linked and open data.

Flood Feeder - Visual aggregation app (#floodhack 2014 winner)

Flood Feeder is a tool to aggregate flood-related datasets (spatial or non-spatial), and spit them out as GeoJSON/JSON (a friendly and easy to use data format for developers to work with). This makes the development process quicker, the data becomes easier to grasp and increases the chances of useful apps and visualisations being developed.

The idea was conceived and partially implemented at #floodhack - a relief effort to help those affected by the flooding in the UK. Flood feeder won a hosting package from Google and a development grant from Nominet Trust.

It offers geographic granularity, allowing you to aggregate data for a country, a ward, a town or another type of geographic area.

The feeds available are presented as a list of geographic layers, flood related data feeds and other types of datasets - each to be included or excluded from your aggregated feed, which - (if they contain geographic data such as point locations [cellphone masts, river guages] or shapes [flood areas, cellphone coverages]) - can be previewed live on the map.

Once finished building your feed, you can export it as JSON/GeoJSON.


Built by Dan Smith, Brendan Quinn and Greg Nwosu.


The LinkedGov extension for Google Refine is a plug and play module that semi-automatically cleans up messy, unformatted government data.

The technical documentation can be found here: http://wiki.linkedgov.org/index.php/Google_Refine_extension

As the data is cleaned and formatted, the user is asked to specify things and answer questions about their data, which in turn creates mappings that transform the user's imported tabular data into graph data - specifically - RDF (a linked data format).

The extension offers a number of wizards that carry out cleaning, linking or enriching tasks on the data. For example, a cleaning wizard may ask the user if there's a column containing dates - and which format(s) the dates are in (e.g. 02/05/2012, 02-May-2012...). The wizard is then able to process and reformat the dates into a standardised, globally acceptable format (ISO8601).

The linking wizard is able to perform automatic reconciliation against linked data endpoints on the web - linking them to their online definitions. For example, a column containing department name abbreviations (e.g. MOD, BIS, HMRC...), could be automatically reconciled to their identifiers on data.gov.uk (e.g. http://reference.data.gov.uk/doc/department/bis). By linking the values to their online definitions - whole new paths of information can be accessed for each department.

data.gov.uk, TNA, TSO, Cabinet Office

Click here to view the Cabinet Office's organogram.

Head over to http://data.gov.uk/organogram to view all organograms.

This is an organisational chart (organogram) visualisation for the structure of 'posts' within the UK government. Government departments are comprised of units which contain posts and these posts can be held by one or more people.

This visualisation shows the paths of responsibility in terms of who reports to who for the post in question by including it's 'parent' posts and it's 'children' posts. Clicking on a post in the visualisation should load it's children posts if there are any present.

Each post has an information panel that includes information such as the name of the person(s) that holds the post, their contact details, the name of the departmental unit the post exists in, a description of the post's role and there are also links available that take you to the information itself - provided by the Linked Data API.

The source code for this visualisation is available from http://code.google.com/p/linked-data-api/

To view the sources of data the visualisation uses at any time while using it, there's information provided about all of the the API calls made in the bottom right under "Data sources". Here you can grab the data in several different formats and see which parameters have been used to tailor the data for the visualisation.

data.gov.uk, TNA, Cabinet Office

Online demo available.

This is a treemap visualisation of the UK government's department structure using data provided by data.gov.uk. As more of the "reference" linked data is made available, this visualisation will change and grow automatically as it accesses the data and is created in the browser in real-time.

The visualisation lets you drill down from the top-level of the government's structure (departments), down into their units and then through to the lowest-level - the unit posts, which are held by people.

The departments and units are both sized by the number of posts they contain and the posts are sized by the number of posts that report to them (i.e. by a measure of responsibility).

To view the sources of data the visualisation uses at any time while using it, there's information provided about all of the the API calls made in the bottom right under "Data sources". Here you can grab the data in several different formats and see which parameters have been used to tailor the data for the visualisation.


In Feburary I was asked to have a pop at visualising some spending data for some local UK councils. The data was in linked data format (RDF/Turtle) – so stored in a RDF store somewhere and had the Linked Data API layered on top (the Puelia implementation).

The brief was to build an open-source, interactive, cross-browser dashboard of widgets; that would allow the comparison of council’s spending data, say, per month.

After a little time spent researching, I came to the conclusion that RaphaelJS (an open-source JavaScript vector library) would fit the bill for this project nicely. The documentation wasn’t great (I’m used to that though as I’ve been using theJIT library for previous visualisation work), but understandable enough to pick some of the demos apart and get the hang of how things worked within a few days.

The great thing about RaphaelJS is that it’s simply a drawing library, so you can create a static SVG image of a banana or you can create a animated, multi-coloured, shape-shifting, real-time rotting banana, thanks to being able to manipulate and listen to events on the SVG DOM elements that form the vector image.

I opted to make use of three of the demos available – two versions of line chart and a pie chart.

After getting familiar with the innards of the demos, the next stage was to groom and convert the JSON data returned by the Linked Data API into a processable array of numbers to feed into the functions provided by RaphaelJS – once that was working then customisation could begin, letting colours, spacing and familiar chart techniques to help convey the data.

To emphasize the project backbone – open data – I threw in a sortable and paginated table inside each widget, so the user isn’t simply presented with a summarised chart, they can dig down a level and manipulate and scan the actual figures being used to create the visualisations.

Using the Environment Agency’s bathing water quality data provided by DEFRA and UK Location, I’ve whipped up a very quick visualisation of my own – showing off indicators for faecal bacteria at the bathing sites.

The data source:

Notable parameters I used:

  • view = “basic” view
  • properties = lat, long, total coliform count, faecal coliform count, faecal streptococci count
  • page size = 500
  • sort = by faecal streptococci count (descending)

The steps to create the visualisation:

  • Realised I was quite interested in seeing where the bathing sites were that contained the most faecal bacteria (Blackpool area by the looks of it)
  • Tailored my own API call using instructions from the API documentation.
  • Tried using Yahoo Pipes and other connecting API thingys to see what I could do within about 5-10 minutes.
  • Decided to use Googles Fusion Tables as it can create maps from spreadsheet data
  • The Enviroment API offers several formats, so I just changed my API call to include “.csv” intsead of “.html”.
  • Then, after spending the last half year developing the LinkedGov extension for Google Refine – I immediately thought of it as the first go-to tool to shape the data and make it fit for importing into some sort of mapping API.
  • I used Google Refine’s faceting & number range features to decide how to split the bacteria counts into low, medium and high.
  • I exported the data from Refine as CSV to my computer.
  • I uploaded the CSV into Fusion Tables – and all the hard work was done for me!
  • Time taken = 15 minutes (I have experience with the Linked Data API – otherwise it would have taken me a little while longer to tailor the API call I wanted)

Please note: I have manually adjusted the banding points for the levels of bacteria so the visualisation showed a visually pleasing number of red, yellow and green markers. While a site may have a red marker – it could actually be of quite high water quality.

Total Coliform Count

Low-temperature electron micrograph of a clust...

Low-temperature electron micrograph of a cluster of E. coli bacteria, magnified 10,000 times. Each individual bacterium is oblong shaped. (Photo credit: Wikipedia)

Faecal Coliform Count

Faecal Streptococci Count

Gram-stained smear of streptococci

Gram-stained smear of streptococci (Photo credit: Wikipedia)

An acronym for Semantic Analysis of complementary RESources – this was my final year university project based on image annotation and retrieval. The motivation is driven by the current lack of (quality) media description.

This project explores the benefits of using an image’s complementary text to help describe it using semantic technologies and machine- readable formats. Natural language processing services are used to generate keywords describing the main entities or topics from the complementary text, which when in the right context, can be used as high-level and conceptual descriptions due to their strong correlation with the depictions within the image.

A number of web services and technologies are used to create a single system, capable of running for periods of at least two weeks unaided, automatically annotating existing images on the Web using Linked Data (DBpedia) concepts. Interfaces are used to demonstrate the feasibility of such as system and also demonstrate the benefits of using concepts during retrieval, by seamlessly integrating external galleries accessible through categorical and exploratory information paths.

A test dataset is collected and user based evaluations are run to help evaluate a system that requires evaluation through user opinion. The results highlight semantic issues between the image and it’ s complementary text in regards to concept-based annotation and also reveals certain semantic scenarios that can be largely responsible for the success and failure of such a system described. The benefits of using external vocabularies of concepts as annotations are evaluated using the outcomes of a scenario with promising future work proposed.

My original final year project idea (which has been semi-implemented).

PEREL (PEople-RELations) was to be an application that automatically generated Linked Data about a novel – more specifically, the characters and their relations with each other. OpenCalais provides the ability to perform entity and term extraction, the novel input and RDF output handling has been coded in Java and the actual application was never built as I decided to change my project to SARES.

The stage of implementation was reached where it’s possible to point the Java application to a .txt file (usually a novel in .txt format from Project Gutenberg) to be used as the input. The chapter mark-up is then specified by the user in order to let the application know how to split the novel into processable chunks, so for example there are the possibilities of ascending numbers, Roman numerals in lower and upper-case or prefixes and suffixes such as “scene” or “chapter” followed by numbers or Roman numerals.

With the novel broken into chunks, each chunk of text is sent off to the OpenCalais API – which within a few seconds would return semantic information about that text and most usefully, the character entities, locations and their relations. The RDF files are then uploaded to a Sesame RDF repository that has the same name as the novel’s .txt file. SPARQL queries are then able to be executed, picking up on interesting node-edge statistics relating to the textual statistics such as word/letter frequencies and size of input.

Uses tweets to visualise what the world is dreaming about.

Visit Dreamworld online.

People tweet about their dreams, but they get lost or forgotten about on servers around the world. Dreamworld is an effort to briefly catch and transform them into interactive and exploratory information for its audience. Being a visualisation in two respects; not only is it a general visualisation of Twitter data, it also allows its audience to visualise and make sense of what people are dreaming about.

Twitter has seen an influx of benign tweets, random blurts ­ everyone letting everyone know what they had for breakfast – these tweets can be related to the lowest level of Maslow’s Hierarchy of Needs – the “physiological” level, which never really amount to anything of interest, but people still feel the need to tweet about them. In an operating space of 140 characters or less, Dreamworld transforms tweeted dreams into topics, image galleries and dream interpretations presented as an all ­ encompassing audio and visual interactive.

Categorising the type of interaction that is exemplified with this visualisation into the two types mentioned in the coursework brief is not so straightforward. While addressing an interesting aspect of interaction with Twitter, such as what people feel the need to tweet for – in this particular case the announcement of one’s dream, it also responds to user input – but the “interesting” aspect of the users input is the content generated by the author of the tweet ­ as opposed to focusing on users interacting with the Twitter data in an “interesting” way. That said, this visualisation does prompt users to interact with it, but the more important interactive aspect is about what can be done with the information in a tweet and how it can enable the audience to explore and interact with it.

Twitter itself is almost completely transparent to the visualisation (apparently the @ signs in the tweets are a giveaway) as a result of a choice made regarding whether to make it clear to the user that the information has originated from Twitter or not. A user’s experience is subject to change depending on their familiarity with Twitter; so a user that understands that each dream has been produced from a tweet, could have a more “see ­ through” and second ­ hand experience compared to a user who is unfamiliar with Twitter, who may feel like they are absorbing the information in a more personal, firsthand fashion – contributing to the visualisation’s engagement aspect.

Presenting "Plinth" - London's new information kiosk?

A communicational/presentational exercise within my spatial analysis masters course - View presentation online.