Dan Paul Smith

Interface and visualisation developer.

I'm Dan Smith. I'm a web developer specialising in interfaces and visualisations for (linked) data projects. Welcome to my personal website, you can see what I'm up to, where I've worked and how to contact me.

Currently freelancing as a developer and designer for data interfaces and visualisations - if you want to get in touch, feel free to contact me below using the form or on Twitter.

I'm also currently a research student at UCL's Centre for Advanced Spatial Analysis, taking a 2-year part-time MRes course.

Since graduating in 2010, I've been working with a number of teams associated with the UK government's transparency movement - designing and developing web-based visualisations and interfaces for data projects, designing APIs, data modelling and a considerable amount of data wrangling.

Prior to this, my final year project at university was based on an image annotation and retrieval application powered by linked data. It explored new types of interfaces that deal with semantically enriched data and how information retrieval systems might benefit from the use of semantic data. The project was very well received and led me into my current line of work.

The areas of computing I'm interested in span human-computer interaction, graph networks, audio and vision. Connecting the virtual to the physical is fascinating for me.

LinkedGov

The LinkedGov extension for Google Refine is a plug and play module that semi-automatically cleans up messy, unformatted government data.

The technical documentation can be found here: http://wiki.linkedgov.org/index.php/Google_Refine_extension

As the data is cleaned and formatted, the user is asked to specify things and answer questions about their data, which in turn creates mappings that transform the user's imported tabular data into graph data - specifically - RDF (a linked data format).

The extension offers a number of wizards that carry out cleaning, linking or enriching tasks on the data. For example, a cleaning wizard may ask the user if there's a column containing dates - and which format(s) the dates are in (e.g. 02/05/2012, 02-May-2012...). The wizard is then able to process and reformat the dates into a standardised, globally acceptable format (ISO8601).

The linking wizard is able to perform automatic reconciliation against linked data endpoints on the web - linking them to their online definitions. For example, a column containing department name abbreviations (e.g. MOD, BIS, HMRC...), could be automatically reconciled to their identifiers on data.gov.uk (e.g. http://reference.data.gov.uk/doc/department/bis). By linking the values to their online definitions - whole new paths of information can be accessed for each department.

data.gov.uk, TNA, TSO, Cabinet Office

Click here to view the Cabinet Office's organogram.

Head over to http://data.gov.uk/organogram to view all organograms.

This is an organisational chart (organogram) visualisation for the structure of 'posts' within the UK government. Government departments are comprised of units which contain posts and these posts can be held by one or more people.

This visualisation shows the paths of responsibility in terms of who reports to who for the post in question by including it's 'parent' posts and it's 'children' posts. Clicking on a post in the visualisation should load it's children posts if there are any present.

Each post has an information panel that includes information such as the name of the person(s) that holds the post, their contact details, the name of the departmental unit the post exists in, a description of the post's role and there are also links available that take you to the information itself - provided by the Linked Data API.

The source code for this visualisation is available from http://code.google.com/p/linked-data-api/

To view the sources of data the visualisation uses at any time while using it, there's information provided about all of the the API calls made in the bottom right under "Data sources". Here you can grab the data in several different formats and see which parameters have been used to tailor the data for the visualisation.

data.gov.uk, TNA, Cabinet Office

Online demo available.

This is a treemap visualisation of the UK government's department structure using data provided by data.gov.uk. As more of the "reference" linked data is made available, this visualisation will change and grow automatically as it accesses the data and is created in the browser in real-time.

The visualisation lets you drill down from the top-level of the government's structure (departments), down into their units and then through to the lowest-level - the unit posts, which are held by people.

The departments and units are both sized by the number of posts they contain and the posts are sized by the number of posts that report to them (i.e. by a measure of responsibility).

To view the sources of data the visualisation uses at any time while using it, there's information provided about all of the the API calls made in the bottom right under "Data sources". Here you can grab the data in several different formats and see which parameters have been used to tailor the data for the visualisation.

Talis

In Feburary I was asked to have a pop at visualising some spending data for some local UK councils. The data was in linked data format (RDF/Turtle) – so stored in a RDF store somewhere and had the Linked Data API layered on top (the Puelia implementation).

The brief was to build an open-source, interactive, cross-browser dashboard of widgets; that would allow the comparison of council’s spending data, say, per month.

After a little time spent researching, I came to the conclusion that RaphaelJS (an open-source JavaScript vector library) would fit the bill for this project nicely. The documentation wasn’t great (I’m used to that though as I’ve been using theJIT library for previous visualisation work), but understandable enough to pick some of the demos apart and get the hang of how things worked within a few days.

The great thing about RaphaelJS is that it’s simply a drawing library, so you can create a static SVG image of a banana or you can create a animated, multi-coloured, shape-shifting, real-time rotting banana, thanks to being able to manipulate and listen to events on the SVG DOM elements that form the vector image.

I opted to make use of three of the demos available – two versions of line chart and a pie chart.

After getting familiar with the innards of the demos, the next stage was to groom and convert the JSON data returned by the Linked Data API into a processable array of numbers to feed into the functions provided by RaphaelJS – once that was working then customisation could begin, letting colours, spacing and familiar chart techniques to help convey the data.

To emphasize the project backbone – open data – I threw in a sortable and paginated table inside each widget, so the user isn’t simply presented with a summarised chart, they can dig down a level and manipulate and scan the actual figures being used to create the visualisations.

Using the Environment Agency’s bathing water quality data provided by DEFRA and UK Location, I’ve whipped up a very quick visualisation of my own – showing off indicators for faecal bacteria at the bathing sites.

The data source:
http://environment.data.gov.uk/doc/bathing-water.html?_view=basic&_properties=samplingPoint.lat,samplingPoint.long,latestSampleAssessment.faecalColiformCount,latestSampleAssessment.faecalStreptococciCount,latestSampleAssessment.totalColiformCount&_page=0&_sort=-latestSampleAssessment.faecalStreptococciCount

Notable parameters I used:

  • view = “basic” view
  • properties = lat, long, total coliform count, faecal coliform count, faecal streptococci count
  • page size = 500
  • sort = by faecal streptococci count (descending)

The steps to create the visualisation:

  • Realised I was quite interested in seeing where the bathing sites were that contained the most faecal bacteria (Blackpool area by the looks of it)
  • Tailored my own API call using instructions from the API documentation.
  • Tried using Yahoo Pipes and other connecting API thingys to see what I could do within about 5-10 minutes.
  • Decided to use Googles Fusion Tables as it can create maps from spreadsheet data
  • The Enviroment API offers several formats, so I just changed my API call to include “.csv” intsead of “.html”.
  • Then, after spending the last half year developing the LinkedGov extension for Google Refine – I immediately thought of it as the first go-to tool to shape the data and make it fit for importing into some sort of mapping API.
  • I used Google Refine’s faceting & number range features to decide how to split the bacteria counts into low, medium and high.
  • I exported the data from Refine as CSV to my computer.
  • I uploaded the CSV into Fusion Tables – and all the hard work was done for me!
  • Time taken = 15 minutes (I have experience with the Linked Data API – otherwise it would have taken me a little while longer to tailor the API call I wanted)

Please note: I have manually adjusted the banding points for the levels of bacteria so the visualisation showed a visually pleasing number of red, yellow and green markers. While a site may have a red marker – it could actually be of quite high water quality.

Total Coliform Count

Low-temperature electron micrograph of a clust...

Low-temperature electron micrograph of a cluster of E. coli bacteria, magnified 10,000 times. Each individual bacterium is oblong shaped. (Photo credit: Wikipedia)

Faecal Coliform Count

Faecal Streptococci Count

Gram-stained smear of streptococci

Gram-stained smear of streptococci (Photo credit: Wikipedia)

An acronym for Semantic Analysis of complementary RESources – this was my final year university project based on image annotation and retrieval. The motivation is driven by the current lack of (quality) media description.

This project explores the benefits of using an image’s complementary text to help describe it using semantic technologies and machine- readable formats. Natural language processing services are used to generate keywords describing the main entities or topics from the complementary text, which when in the right context, can be used as high-level and conceptual descriptions due to their strong correlation with the depictions within the image.

A number of web services and technologies are used to create a single system, capable of running for periods of at least two weeks unaided, automatically annotating existing images on the Web using Linked Data (DBpedia) concepts. Interfaces are used to demonstrate the feasibility of such as system and also demonstrate the benefits of using concepts during retrieval, by seamlessly integrating external galleries accessible through categorical and exploratory information paths.

A test dataset is collected and user based evaluations are run to help evaluate a system that requires evaluation through user opinion. The results highlight semantic issues between the image and it’ s complementary text in regards to concept-based annotation and also reveals certain semantic scenarios that can be largely responsible for the success and failure of such a system described. The benefits of using external vocabularies of concepts as annotations are evaluated using the outcomes of a scenario with promising future work proposed.

My original final year project idea (which has been semi-implemented).

PEREL (PEople-RELations) was to be an application that automatically generated Linked Data about a novel – more specifically, the characters and their relations with each other. OpenCalais provides the ability to perform entity and term extraction, the novel input and RDF output handling has been coded in Java and the actual application was never built as I decided to change my project to SARES.

The stage of implementation was reached where it’s possible to point the Java application to a .txt file (usually a novel in .txt format from Project Gutenberg) to be used as the input. The chapter mark-up is then specified by the user in order to let the application know how to split the novel into processable chunks, so for example there are the possibilities of ascending numbers, Roman numerals in lower and upper-case or prefixes and suffixes such as “scene” or “chapter” followed by numbers or Roman numerals.

With the novel broken into chunks, each chunk of text is sent off to the OpenCalais API – which within a few seconds would return semantic information about that text and most usefully, the character entities, locations and their relations. The RDF files are then uploaded to a Sesame RDF repository that has the same name as the novel’s .txt file. SPARQL queries are then able to be executed, picking up on interesting node-edge statistics relating to the textual statistics such as word/letter frequencies and size of input.

Uses tweets to visualise what the world is dreaming about.

Visit Dreamworld online.

People tweet about their dreams, but they get lost or forgotten about on servers around the world. Dreamworld is an effort to briefly catch and transform them into interactive and exploratory information for its audience. Being a visualisation in two respects; not only is it a general visualisation of Twitter data, it also allows its audience to visualise and make sense of what people are dreaming about.

Twitter has seen an influx of benign tweets, random blurts ­ everyone letting everyone know what they had for breakfast – these tweets can be related to the lowest level of Maslow’s Hierarchy of Needs – the “physiological” level, which never really amount to anything of interest, but people still feel the need to tweet about them. In an operating space of 140 characters or less, Dreamworld transforms tweeted dreams into topics, image galleries and dream interpretations presented as an all ­ encompassing audio and visual interactive.

Categorising the type of interaction that is exemplified with this visualisation into the two types mentioned in the coursework brief is not so straightforward. While addressing an interesting aspect of interaction with Twitter, such as what people feel the need to tweet for – in this particular case the announcement of one’s dream, it also responds to user input – but the “interesting” aspect of the users input is the content generated by the author of the tweet ­ as opposed to focusing on users interacting with the Twitter data in an “interesting” way. That said, this visualisation does prompt users to interact with it, but the more important interactive aspect is about what can be done with the information in a tweet and how it can enable the audience to explore and interact with it.

Twitter itself is almost completely transparent to the visualisation (apparently the @ signs in the tweets are a giveaway) as a result of a choice made regarding whether to make it clear to the user that the information has originated from Twitter or not. A user’s experience is subject to change depending on their familiarity with Twitter; so a user that understands that each dream has been produced from a tweet, could have a more “see ­ through” and second ­ hand experience compared to a user who is unfamiliar with Twitter, who may feel like they are absorbing the information in a more personal, firsthand fashion – contributing to the visualisation’s engagement aspect.

In the last few years I've worked on open data projects for the UK government with a number of teams - including the Cabinet Office, data.gov.uk, The National Archives, The Stationery Office, Talis and Epimorphics.

I've also spoke at a number of conferences and events on the work done in these years - specifically on user interaction and data consumption for linked open data.

Specialties

Interfaces, visualisations, web applications, linked data, semantic technologies.

JavaScript & JS graphic libraries, JSON, AJAX, linked data formats, Java, Google Refine & extensions.

Front-end lead

LinkedGov

August 2011April 2012 (9 months)

Oversaw the front-end of the LinkedGov project - a community project funded by the government - aiming to clean up, enrich and link together government data. I was responsible for developing the LinkedGov extension for Google Refine.

Data interface & visualisation developer

Freelance

June 2010Present

Projects I've worked on have included;
- developing interfaces to help clean messy data
- representing government organisational data to provide an interactive, cross-government organisation chart
- web-based dynamic infographics that help convey obscure data
- interactive widgets and charts that allow comparison of spending data

All of these projects have been powered by open, linked data and have more or less all come about as a result of the UK Government's transparency movement.

Teaching Assistant

Queen Mary, University of London

September 2009January 2010 (5 months)

Provided assistance to help teach and examine students in their second year taking the Algorithms and Data Structures module.

Implementation Team

Squiz

March 2009October 2009 (8 months)

Created areas and features of sites for The Actuarial Profession, Digital UK, Drinkaware, Mining Journal, Mark Warner Holidays, Oxford Magdalen College, Paint Research Association and University of Westminster.

Support Team

Squiz

August 2008February 2009 (7 months)

Supported clients on a range of issues concerning content publishing, web page implementation and MySource Matrix queries. Clients included Boots, Brighton University, The Electoral Commission, EMAP, Informa, Office Products International, Redwood/NSPCC, Oxford University, Royal Society of Arts, Shelter, Westminster Abbey and the World Health Organisation.

Appearances & Publications

Session:
UK Gov Camp 2012 – Microsoft (Victoria), January 20 & 21 2012

Held sessions on both days – one to present the work I've been doing for LinkedGov on the LinkedGov extension for Google Refine, the other as a feedback session for the extension – with live user testing.

Speaker:
SemTech 2011 – Hotel Russel, September 27 2011

Speaker for the 'Government Organograms as Linked Data' track with Jeni Tennison and John Sheridan.

Speaker:
OpenTech 2010 – ULU, September 11 2010

Speaker for the 'data.gov.uk – process and properties' seminar with Jeni Tennison and John Sheridan.

Speaker:
Open Government Data Camp – ULU, November 19 2010

Speaker for the 'Convenient APIs for Linked Data' seminar with Dave Reynolds.

Publication:
Submission for The International Semantic Web Conference 2010, Shanghai, China

'org: Publishing government organizational information as linked data' – Dave Reynolds, Jeni Tennison, Dan Smith, John Sheridan

Education

University College London

MRes, Advanced Spatial Analysis & Visualisation

2012Present

Currently studying at the Centre for Advanced Spatial Analysis at UCL.

Queen Mary, University of London

BSc, Multimedia Computing

20062010

Received a First Class Honours and the Principal's Prize for outstanding achievement.