Archive for September, 2011

OpenART – Some Wins and Fails

Win! Researcher on hand to explain the data and answer questions.

Win! Openness to being open with the data.

Win! Increased understanding of open data and ontologies for the domain.

Win! Ready-made expertise on the team.

Win! Google Refine as a quick way of experimenting with spreadsheet data and getting RDF out of spreadsheets.

Win! Indexing in SINDICE should be a quick win.

Fail! The ontology took longer to create than we anticipated.

Fail! The data is complex and still a work in progress, the spreadsheets memory-hungry and in need of some cleanup and post-processing.

Fail! Lots of re-visiting and round-tripping slows things down.

Fail! There is a gap between the precision needed for an ontology and the working spreadsheets of a researcher.

Leave a Comment

OpenART – Costs and Benefits

OpenART – Costs and Benefits

One of the blog posts required for OpenART project is around ‘costs and benefits’: “This should be a very rough estimation of how much it has cost you in terms of time and resources to make your data openly available. What do you expect the benefits to be? And how do these 2 assessments balance out against each other?”.

Ontology Development

It’s taken around 10 days to develop the ontology in it’s current state. Bearing in mind this has been created by someone with a high level of knowledge and expertise in ontologies, related technologies and tooling, someone who already knows of existing relevant ontologies and could rapidly prototype an approach. A quicker and simpler approach would have been to ‘mix and match’ by selecting properties from a range of existing ontologies, without going so far as to develop a dedicated ontology. A longer and more complex approach would have been to experiment with different ontology approaches to establish the best and richest solution, eg. descriptions and situations. Given the time available we opted for the middle ground, but this has to some extent delayed other work and has made for a more complex process of ‘understanding’.

Data Analysis and Manipulation

Analysis of the spreadsheets, importing and exporting the data, building web views of the data and data cleanup, I’d put at 25 days.

Generating the Data

Generating RDF instance data from the spreadsheets, post-implemention would take around 10 days to set-up and produce a sub-set of data. Longer if we want to process all of the data, which we aren’t doing right now.

Technical Implementation

Prototyping and implementing a simple approach to resolving identifiers and content negotiation will come in around 10 days.

Building Understanding

Developing an understanding of the ontology, of linked data principles and of how to construct RDF, I’d estimate at taking up around 25 days, including background reading and research, digging into the ontology, working through examples and selecting the right tools. This also includes time spent on understanding the data itself, working through the spreadsheets, talking with the content creator.

Resources

In terms of people involved, there are a number, including: consultant partners (expert in the semantic aspects and technologies), developer (for implementation and data processing), researcher (for understanding the data), Tate web manager and curator (for exploring the Tate use case and validation), Digital Library core staff (for future sustainability).

Summary

This gives us a grand total of around 80 days, or 16 weeks, around 4 months of work involving a range of people. This is starting out with complex data, gaps in understanding of the data and in how to ‘do’ open data, along with a myriad of potential mechanisms and methods.

What this doesn’t include, though, is the ongoing cost, in maintaining and building on the project outcomes and rolling out the generation of data to the full dataset and working with additions and changes to the dataset. Extending the work to do more ‘linking’, assessing the scalability of this approach, experimenting with different approaches and promoting the content.

Benefits

Given that it is still early days for ‘linked data’ and for our content, benefits are harder to quantify. The immediate benefits are engagement and seeing our researcher and Tate partners really begin to think about what open and linked data might offer them. Another immediate benefit is a new class of content for the Digital Library and the potential for extending our reach into storing and serving ‘data’ from York Digital Library and enabling us make progress in opening our data.

Looking forward, what OpenART and open data promise are greater visibility and usability of information, making research richer and easier and reducing the amount of information that needs to be stored locally and constructed manually. For cultural institutions like Tate, open and linked data offer opportunities for increasing web traffic and usage, by offering new ways of exposing and consuming data, greater possibilities for dynamic links between artworks, artists, places and events and richer visualisations of data. For researchers, and other consumers, open data offers mechanisms to follow different paths through information, tracing art history from creation, through the art market and into a galleries, telling interlinked and often unexpected stories along the way. Whilst OpenART deals with only a fragment of the art market, it offers a  possible model that, if extended, could open up art history to much more detailed analysis.

How do these balance out?

Personally, I think that the cost is worth it, but only if we start to see real use of the data, and more data being opened up. Given that this latter can’t happen without investment in the former, we need to make a persuasive case to continue the work, engage in the community and build tools to aggregate disparate content for the non-technical user. One lesson learnt though is that not all data is made equal, and ours is still ‘under construction’ which adds a layer of complexity when working with a moving target.

Leave a Comment