Bring out your dead!

Yesterday I attended the Digital Preservation Coalition’s ‘Bring out your dead!’ day (subtitled more soberly: ‘Collaborative approaches to managing file formats – a day of action’. Monty Python reference appreciated, though). The point of the day was to discuss the problems that file formats present to digital preservation (and also to bring along our own problem files). These obviously include dealing with very unusual files and files of unknown type. There are tools, such as DROID, which can help with identifying unknown files. The surprise for me was how much of the discussion focussed on older versions of very well-known file types, particularly Microsoft Office files. Chris Rusbridge, in his opening presentation, talked about his experience of trying to convert his old PowerPoint files, created years ago on an old Mac, to a current format (he found a company that could do it) and also his open letter to Microsoft about publishing the specifications for old versions of their file formats. The latter had surprising results (Microsoft willing to help, but they don’t have the specs).

 

Much of the discussion also centred on the importance of collaboration to solving the file format problem. A key part of this is contributing information to file format registry projects such as PRONOM and CRISP, which suffer from lack of detail in their records (though TNA’s David Clipsham explained that the focus for PRONOM had been on populating it with file signatures, rather than other details). David Clipsham presented a session on how to produce file signatures which, to a non-techy like me, was an eye opener. It was especially interesting to see how easy it is. Essentially, a file signature is a string of binary data which always appears within a file of a particular sort and so which can be used by a tool such as DROID to diagnose the file type. To spot these strings, all one has to do is open a number of examples of the file type in a hexadecimal editor and flick through the open tabs quickly (a bit like a flicker book) to see which bits of the file stay the same (usually it’s the beginning bit). Submitting examples of file signatures to PRONOM helps to build up the registry and increase its usefulness.

 

Other ways that we need to collaborate as a sector include better coordination between projects and coordinated lobbying of software companies (and government), of the sort pioneered by Chris Rusbridge. There are several projects which have attempted to create file format registries and the consensus seemed to be that it isn’t necessarily a bad thing to have more than one registry, but that data needs to be shared systematically (preferably automatically) between them. The same body of data shared in several places isn’t bad, but data split between several places is. In the lively final discussion a big topic was the need for someone to take a strong leadership or coordination role in carrying forward the lobbying of software companies to release file specifications. Chris Rusbridge rather theatrically put William Kilbride of the DPC on the spot and William deftly turned this round as a question for the DPC’s members as to whether the DPC was the organisation to take a lead in this.

 

If I have a criticism of the day it was that it was billed (unless I got the wrong end of the stick, which commonly happens) as a workshop for dealing with delegates’ problem files (the ‘dead’ of ‘Bring out your dead’). The discussions and workshops were interesting, presenting a number of tools for charactersing files and creating signatures. However, it was less hands-on than I expected. I came down with a slightly different problem to that of unknown/obsolete file formats. We have some TIF files which create problems when converted to JP2 and streamed and we don’t know why. I didn’t get any answer from the workshop itself, but I was at least able to discuss it with other people during the break and find a possible source of help. Which was useful!

Comments (1)

Digital Library Improvements for the autumn term

Over the summer we have been working with a student on a usability study of York Digital Library. The resulting dissertation has given us valuable data for the ongoing improvement of the Digital Library. As a first step, we assembled a list of priority issues for the start of this academic year and have pushed forward with a ‘usability sprint’. The following list of changes are mostly small, but together we feel they will help students and other users navigate around the Digital Library and get more out of our collections.

All of these changes will be visible early on Sunday 7th October.

Release Notes – Digital Library – Version YODL 2 2012-09

Usability and Display Issues

  • Artist names do not display in the metadata detail page if there is no role specified (eg. sculptor)
  • Attribution field does not display in image metadata detail
  • Reorganise label information for location image detail
  • Add a hyphen between earliestDate and latestDate in image metadata detail
  • Problems with display of rights info for images
  • Display multiple titles in image detail page
  • Make order of fields in ‘home’ detail view consistent
  • Truncate long titles in search results display
  • Make description display for collections in collections detail page
  • Make sort order of browse tree and results alphabetical by default
  • Indexing Issue with DC, some fields are not searching; allow all fields to be searched
  • Change label for location in image detail page, for clarity
  • Correct bug where creator appears twice in home detail page
  • Correct bug where type displays twice on collection detail view
  • Add which collection(s) a resource belongs to for the resource detail page
  • Offer sort options for browse/search
  • Display date with search results on Exam Papers
  • Breadcrumb trail for search results
  • Change sort order in browse to alphabetical
  • Exclude openart results from search, as reported as confusing to users
  • Remove thumbnail from download list on all objects, as reported as confusing to users
  • Make “about the image” open by default in image metadata detail
  • Refresh featured items on homepage
  • Improve introductory information and help on homepage
  • Add a general help page
  • Add a help pop-up by the download button
  • Fix bug in sort by in search
  • Add a link to the help page from ‘need help’ section on homepage
  • Re-order collection metadata display: dd subject before rights in collection
  • Enhance collection metadata

Access and security improvements

  • YODL login / session management improvements, to prevent users from being ‘thrown out’ of their logged in session
  • Make Masters theses metadata public; restrict files to York users
  • Make top level and department exam papers collections metadata public to aid navigation

Leave a Comment

York Digital Library Survey & Focus Groups

Have you ever searched online for University of York resources? If you have, it’s likely that you’ve used the University of York Digital Library (YODL). Named YODL for short, this where past exam papers are stored, as well as a huge number of History of Art images and other resources.

The library is currently looking for feedback from students and staff on YODL, so if you’ve ever used it, we’d love to hear your opinions via our short questionnaire; the link is on the front page of the site.  If you haven’t used YODL before, perhaps you would be interested in taking a look around the site at http://dlib.york.ac.uk/ before completing the questionnaire. All responses will be anonymous, but if you do want to leave an email address, you’ll be entered into a prize draw to win a £25 Amazon voucher.

We are also looking for participants for a focus group to give us more detailed feedback on the site; participants will be paid £10 for approximately 30 – 60 minutes of your time. Again, no previous experience of the site is needed; just have a look around to form some opinions before taking part. Even if you think you don’t have much to say, we want to hear it!

There are two focus groups, one for staff and one for students:

The staff focus group will take place this Thursday, 28th June at 4pm in LFA/205 (top floor of the Harry Fairhurst Building).

The student focus group will take place on Wednesday 4th July at 11.30am in LFA/205 (top floor of the Harry Fairhurst Building).

Please email mgedwards1@sheffield.ac.uk if you’d be interested in taking part.

Leave a Comment

Digital Futures Academy

Last week I attended the Digital Futures Academy, a course on digitisation run by King’s Digital Consultancy Service at the British Library in London. I had won a scholarship to attend this from the Digital Preservation Coalition, to whom many thanks. I thought I‘d jot down a few key thoughts that I took away, before they get swept away in the stream of everyday this-and-that, like when you go on holiday and the memory of sitting on the beach stays with you for about a day and a half and then your brain files the memories away in a box somewhere labelled ‘things that happened ages ago – possibly for deletion’.

A key thought that kept recurring in the week was the importance of thinking in terms of people, rather than data. Data is the trace of human activity and aspiration. Data fulfils human needs and provides opportunities. This came through particularly when we were discussing the planning and pitching of digitisation projects. The questions we need to ask are: who will benefit? what need does the project fulfil? what will be the outcomes, in terms of people using the data? what is your narrative?

William Kilbride of the Digital Preservation Coalition, one of the speakers, presented an interesting idea, that of the ‘attention economy’ rather than the much-vaunted ‘knowledge economy’. Knowledge and data are not what is scarce, but time. The most valuable thing in this economy is the time people take to engage with your content.

Alastair Dunning, of the European Library, another one of the speakers, presented a nifty analogy of the boutique versus the shopping mall. He presented a photograph of a quaint shop, stuffed with wares and full of character, and said that very often the services we create are like this: beautiful presentation of wares, but hidden away and nobody goes there. The next image was of a shopping mall: soulless and impersonal, but everybody goes there. This stands for the aggregators and mass audience sites (such as Flickr and Facebook). The boutique is fine, but it is important to get our content in the shopping mall, where it will be seen. The final image was a cautionary one: a boarded-up and derelict shop!

Overall, I found the course stimulating and thought-provoking. It raised as many questions as it answered – as it is intended to (there is often no universally correct answer with digitisation projects, but it helps to start by asking the right questions).

Matthew Herring

Leave a Comment

Radically Open Cultural Heritage Data on the Web

I was extremely fortunate to be part of a panel session at SXSW Interactive 2012, held this month as always in the amazing City of Austin, Texas.  The panel, Radically Open Cultural Heritage Data on the Web, was put together by Jon Voss, Strategic Partnerships Director at Historypin, leader of Linked Open Data for Library Archives and Museums (LODLAM) activity and a fantastic voice in the LODLAM space. Joining Jon on the panel was Adrian Stevenson, Senior Technical Innovations Coordinator at Mimas and Project Manger of the excellent LOCAH project and Rachel Frick, Director of the Digital Library Federation and heavily involved in the Digital Public Library of America (DPLA).

Details (and audio) of the panel session can be found on the SXSW site, and my slides are available on slideshare. Adrian has also made his slides available.

Between the four of us, we gave an overview of what linked data is, why it matters to libraries, archives and museums, how people have put data out there already, how others are beginning to consume that data, and how people might get involved. My presentation focussed on York’s OpenART project where we have put almost 40,000 linked data documents on the web. This represents a huge success for our project, which was run on a shoestring budget and timescale. Our approach to exposing data is certainly not perfect, but it’s a significant step for us towards opening our data up for others to work with.

The linked and open data area is certainly growing and beginning to be recognised within the semantic web community. I attended Libraries, Media & The Semantic Web hosted by the BBC on the 28th March 2012 where Jon Voss and Adrian spoke on the same platform as speakers from the New York Times, the BBC and Google. It was particularly encouraging to hear that the BBC has invested a huge amount (20% of it’s Digital budget) into linked data for the Olympics coverage, and also from Dan Brickley (Google) who confirmed the forthcoming support for RDF/A in schema.org. Video from that event will be made available online on the BBC Academy YouTube Channel, and it’s worth watching all of the speakers. JISC have also recently published an interesting article on Linked Data in the recent issue of JISC Inform.

I’m encouraged by the signs of open relationships and willingness to work together to make standards and approaches work, and by the increasing efforts to open up data. I feel that York Digital Library (and perhaps the University more widely) should continue to invest effort into linked open data.

Leave a Comment

York Cause Papers, another Digital Library project

This morning sees the launch of the York Cause Papers images, the result of a JISC-funded rapid digitisation project which ran through to the summer of 2011. The Digital Library has been responsible for taking the scanned images and ingesting them into a Fedora Commons repository, using a configured version of our YODL interface. The HRI in Sheffield host and mange the searchable database of the Cause Papers, and have added links from this database to the image repository, hosted here in York.

The Cause Papers are a fascinating resource, with originals held in the Borthwick Institute for Archives. Further information about the papers and the project can be found in the University’s press release.

To search the York Cause Papers database and find page images visitwww.hrionline.ac.uk/causepapers. Images may also be accessed directly from http://dlibcausepapers.york.ac.uk.

 

Leave a Comment

OpenART Final Report

Made by OpenART:

  • The OpenART ontology, an event-driven ontology produced to describe the ‘artworld’ dataset. The ontology is split into a number of parts to allow greater re-usability. It should be considered ‘work in progress’, although the version published is complete for the OpenART project: dlib.york.ac.uk/ontologies/
  • An ontology browser version of the ontologies can be found at dlib.york.ac.uk/ontologies/openart – for easier reading!
  • Sample data for each ‘primary’ entity in the form of a rdf/xml document and turtle document are available: dlib.york.ac.uk/data
  • Void description for the dataset: dlib.york.ac.uk/data
  • A script for creating all of the documents and ingesting them into the Digital Library Fedora repository.
  • RDF/A embedded in the pages of artworld.york.ac.uk (forthcoming)

Next steps for OpenART and the University of York:

For the University of York, there is some work to complete in order to get the full (current) dataset into our Fedora repository, mainly in setting up the url rewriting and content negotiation rules.

After that, we would ideally like to apply the same linked data principles to other Digital Library content, particularly some of the rich image content that we have. This would involve mapping and modelling work, for example VRA image metadata to linked data, and automating the generation of RDF.

Some thoughts

The approach taken in OpenART is somewhat twofold, with prototyping using a variety of tools (summarised in the technical approaches post) which could be further explored in future work.

The dataset which drives OpenART was released as a web application in October at http://artworld.york.ac.uk, to meet the requirements of the separate AHRC-funded project which it was created out of. This site has been developed as a database-driven application, an approach chosen as a best fit for the time available. The site, always envisaged as a human-user end point, was not initially designed for linked data. Indeed, one might argue that databases and ontologies do not make happy bedfellows. However, what we have found in the project is that is was relatively straightforward to (1) create a script to extract open data documents from the database and ingest that into our Fedora Digital Library, and (2) add RDF/A tagging to the web site itself.

One of the benefits of this approach is that it provides us with a non-proprietory back-up and preservation routine for the database, playing to one of Fedora’s strengths. It also demonstrates how Fedora can be used in place of a simple file structure to serve up linked data documents, bringing with it the advantages of data management, indexing and version control.

What this rather document-centric approach does not provide is a fully indexed RDF store with a SPARQL end point. Although Fedora has these as part of its stack they are internal tools for Fedora, not designed for indexing anything other than the core Fedora datastreams. Future work to enable Fedora’s ‘semantic’ capacity for external content would be extremely useful. The European Interactive Knowledge Stack (IKS) Project is doing interesting work in this area (http://www.iks-project.eu/).

Opportunities

OpenART was always focussed on a narrow rich seam of data, rather than a broad simpler dataset. There is an opportunity here to see how these two approaches can co-exist. Good ontology modelling will allow rich drilled-down terms to be mapped back to broader concepts for greater findability of content, whilst allowing much finer-grained analysis of the detail captured by the ontology. Where there may be a gap is in the tools which query, visualise and analyse the data sources.

Extending existing applications to better support open data is another opportunity, allowing standard repository platforms such as EPrints, DSpace and Fedora Commons to offer standard linked data endpoints, with options for configuring the data exposed.

Google Refine has come out strongly in OpenART as an extremely useful tool for manipulating datasets. It is particularly well suited to people who do not have in-depth programming skills but want to get RDF out of semi-structured documents. There is an opportunity to demystify some of the myth around creating open data and RDF, which can be quite simple to do.

Evidence of reuse

Our data has not been re-used, although we have had interesting discussions with a range of stakeholders from Tate, to be summarised in a blog post on how others could follow in OpenART’s footsteps. Skills What skills were used in your project? Did you already have these skills in your team or did you need to develop them or bring in external experts? Are the processes you have developed embedded in your institutional practice now or are there plans to embed them? Do you plan to develop these skills further? OpenART did have a range of skills in the team, in java programming, databases, Fedora Commons and metadata. External experts brought ontology modelling and RDF expertise, along with extra Fedora Commons expertise. These were essential for the project. OpenART has seen members of the project team gain much deeper knowledge of RDF and ontologies which we would hope to embed in York with further projects around linked open data.

Lessons

Lesson 1

Ontology modelling is complex so allow plenty of time for it. Take time to consider the data model and consider the best approach, a simpler ‘mix and match’ of schema terms might be suitable. For OpenART, where the data is very specific, an ontology was considered the best approach.

Lesson 2

Use Turtle during development phases and get familiar with validation and inspection tools. Turtle is a simple notation for RDF and is very easy to write and to understand. It can be exported directly out of Google Refine and validated by common tools (eg. http://www.rdfabout.com/). Any23 (http://any23.org/) can be used to generate other formats, such as RDF/XML or RDF/A. Sindice’s Inspector tool (http://inspector.sindice.com/) is a useful tool for viewing the relationships in RDF and checking that documents are not just valid, but also correct. Google Refine can be used as a relatively rapid application for generating RDF samples.

Lesson 3

Come up with use cases. What questions do users want to answer with your data? What links do they want to follow? Understanding the uses and potential uses of the data can help both with modelling but also with making the case for doing linked data in the first place.

Comments (3)

OpenART Technology Choices

During the course of the OpenART project we’ve come across a number of different technologies and standards that we could use. In this post we aim to go into the ones we’ve used and found useful.

We aim to answer the following series of questions:

What technologies, frameworks, standards are you using in your project? What are your impressions of them? What difficulties have you encountered? What approaches and techniques have worked well? What advice would you give to others engaging with them?

“What technologies, frameworks, standards are you using in your project?”

Data preparation and mapping/triplification/lifting

The source information was supplied as a set of Excel spreadsheets containing semi-structured data. These spreadsheets were provided by the researcher and were the “live” data capture mechanism for the transcribed source data. As such they were subject to structural changes during the lifetime of the project. Various tools and approaches for manipulation of this source data were explored.

  • Relational Database (RDBMS)
    • Early in the project an approach of migrating the data to an RDBMS was explored
    • This resulted in a cleaner version of the source information, but the approach was abandoned as the spreadsheets were a live tool
    • The RDBMS ontology mapping plugin for Protege was explored for transfer of RDBMS data to RDF/OWL
  • Excel Spreadsheet manipulation
    • Google Refine was used for very quickly visualising and manipulating source information
    • RDF Extension for Google Refine was used for exporting cleaned up data held in Refine to test versions of RDF linked data, and for for matching nodes to validate and connect parts of the data (via the reconciliation feature)
    • GNU SED (Stream Editor) is a powerful general purpose text manipulation tool. This hits the cleaning spots the developers can’t reach (quickly) in the visual Refine environment.

Ontology development

  • Protege version 4.1 was used for ontology development. Protege is a powerful and fully featured ontology IDE (integrated development environment)

Linked Data manipulation and production

  • Rapper (part of the Raptor RDF Parser toolkit) was used for translating between various RDF serialisations. Rapper is a parsing and serialising utility for RDF.
  • SPARQL (A standard query language for RDF) for querying and checking dtata
    • Rasqal via the online service at Triplr. Rasqal provides RDF querying capabilities, including SPARQL queries
    • ARQ, a SPARQL query engine for Jena

Hosting and base systems

  • openSUSE Linux, a major Linux distribution, was used as a base operating system
  • SuseStudio was used to build and deploy bespoke virtual machines to Amazon EC2. SuseStudio allows you to choose packages to build the virtual machine and will directly deploy to an Amazon web service account all within a web-based interface
  • Amazon Web Services (AWS) was used for hosting, particularly the Elastic Compute Cloud (EC2) service for virtualisation
  • The Apache web server with mod_rewrite was used for web hosting of ontologies and data, with content negotiation
  • OntologyBrowser was used for for storing, viewing, manipulating and accessing the ontology, using a web front end and REST API

“What difficulties have you encountered?”

Source data manipulation and conversion

  • The source information was being actively worked upon during the lifetime of the project. This presented difficulties as working on the ontology and triplification to RDF concurrently with changes in the source information made decision-making and coordination difficult
  • The source information was provided as Excel spreadsheets. These were used as an organisational environment for capturing data by the researcher, and provided free text search, categorisation and a basic namiing scheme for entities. Although a degree of rigour was applied, the environment could not be described as “structured data” such as that provided by a database, and this provided challenges in interpreting and modelling the information.
  • Originally the project assumed that hosting of the information as part of the Court, Country, City project would provide a structured source for the information; however the requirements from this project were sufficiently divergent from the OpenART project’s requirements for this not to be the case. As a result the project did not consider alternative automated processes to assist in source data manipulation and triplification until late in the project.

Ontology development tooling

  • Bulk changes in the ontology using Protege was difficult to do efficiently
  • Synchronisation between environments was a challenge with new versions of ontologies

Ontology development informational issues

  • Difficulties in extending the combined ontologies – particularly when trying to move from earlier versions expressing “simplifications” to a more complex framework later in the project. It would have been easier to start with a more complex framework initially rather than trying to extend the earlier version.
  • Naming conventions became very cumbersome, for example when trying to create inverse properties that sometimes had no natural or simple English language fit

“What are your impressions of them?”

  • Protege and RDF tools used locally were found to be very mature. Protege is now mature enough to be usable by newcomers; and also provides a route to further development as it is based on an OWL-API. However it was difficult trying to reason with anything other than a small set of data points.
  • Google Refine was surprisingly efficient and workable. Additional functionality such the as reuse of RDF mapping templates was useful.
  • Cloud services (Amazon EC2) were relatively easy to use; and the combination of openSUSE and SuseStudio provided an efficient and repeatable mechanism for deploying virtual machines
  • OntologyBrowser presents a nice iinterfacae for ontology browsing in HTML. Its user-friendly syntax for ontology axioms means that you don’t need to be an OWL or logic expert. However it is very “data-centric” and not an end-user environment.

“What approaches and techniques have worked well?”

  • Google Refine is good for rapid development when a non-trivial ontology requires that mappings and ontology need to be developed in tandem
  • Use of Protege (see above)
  • Use of cloud services (see above)

“What advice would you give to others engaging with them?”

  • Investigate using a collaborative ontology environment, such as knoodl
  • Invest a small amount of time in collaborative environments, this can lead to a big payoff; cloud services are relatively easy to use
  • Start with an ontology framework complex-enough to represent the modelling required; it can be difficult moving from a simple to a more complex version
Author: Martin Dow

Leave a Comment

OpenART – Some Wins and Fails

Win! Researcher on hand to explain the data and answer questions.

Win! Openness to being open with the data.

Win! Increased understanding of open data and ontologies for the domain.

Win! Ready-made expertise on the team.

Win! Google Refine as a quick way of experimenting with spreadsheet data and getting RDF out of spreadsheets.

Win! Indexing in SINDICE should be a quick win.

Fail! The ontology took longer to create than we anticipated.

Fail! The data is complex and still a work in progress, the spreadsheets memory-hungry and in need of some cleanup and post-processing.

Fail! Lots of re-visiting and round-tripping slows things down.

Fail! There is a gap between the precision needed for an ontology and the working spreadsheets of a researcher.

Leave a Comment

OpenART – Costs and Benefits

OpenART – Costs and Benefits

One of the blog posts required for OpenART project is around ‘costs and benefits’: “This should be a very rough estimation of how much it has cost you in terms of time and resources to make your data openly available. What do you expect the benefits to be? And how do these 2 assessments balance out against each other?”.

Ontology Development

It’s taken around 10 days to develop the ontology in it’s current state. Bearing in mind this has been created by someone with a high level of knowledge and expertise in ontologies, related technologies and tooling, someone who already knows of existing relevant ontologies and could rapidly prototype an approach. A quicker and simpler approach would have been to ‘mix and match’ by selecting properties from a range of existing ontologies, without going so far as to develop a dedicated ontology. A longer and more complex approach would have been to experiment with different ontology approaches to establish the best and richest solution, eg. descriptions and situations. Given the time available we opted for the middle ground, but this has to some extent delayed other work and has made for a more complex process of ‘understanding’.

Data Analysis and Manipulation

Analysis of the spreadsheets, importing and exporting the data, building web views of the data and data cleanup, I’d put at 25 days.

Generating the Data

Generating RDF instance data from the spreadsheets, post-implemention would take around 10 days to set-up and produce a sub-set of data. Longer if we want to process all of the data, which we aren’t doing right now.

Technical Implementation

Prototyping and implementing a simple approach to resolving identifiers and content negotiation will come in around 10 days.

Building Understanding

Developing an understanding of the ontology, of linked data principles and of how to construct RDF, I’d estimate at taking up around 25 days, including background reading and research, digging into the ontology, working through examples and selecting the right tools. This also includes time spent on understanding the data itself, working through the spreadsheets, talking with the content creator.

Resources

In terms of people involved, there are a number, including: consultant partners (expert in the semantic aspects and technologies), developer (for implementation and data processing), researcher (for understanding the data), Tate web manager and curator (for exploring the Tate use case and validation), Digital Library core staff (for future sustainability).

Summary

This gives us a grand total of around 80 days, or 16 weeks, around 4 months of work involving a range of people. This is starting out with complex data, gaps in understanding of the data and in how to ‘do’ open data, along with a myriad of potential mechanisms and methods.

What this doesn’t include, though, is the ongoing cost, in maintaining and building on the project outcomes and rolling out the generation of data to the full dataset and working with additions and changes to the dataset. Extending the work to do more ‘linking’, assessing the scalability of this approach, experimenting with different approaches and promoting the content.

Benefits

Given that it is still early days for ‘linked data’ and for our content, benefits are harder to quantify. The immediate benefits are engagement and seeing our researcher and Tate partners really begin to think about what open and linked data might offer them. Another immediate benefit is a new class of content for the Digital Library and the potential for extending our reach into storing and serving ‘data’ from York Digital Library and enabling us make progress in opening our data.

Looking forward, what OpenART and open data promise are greater visibility and usability of information, making research richer and easier and reducing the amount of information that needs to be stored locally and constructed manually. For cultural institutions like Tate, open and linked data offer opportunities for increasing web traffic and usage, by offering new ways of exposing and consuming data, greater possibilities for dynamic links between artworks, artists, places and events and richer visualisations of data. For researchers, and other consumers, open data offers mechanisms to follow different paths through information, tracing art history from creation, through the art market and into a galleries, telling interlinked and often unexpected stories along the way. Whilst OpenART deals with only a fragment of the art market, it offers a  possible model that, if extended, could open up art history to much more detailed analysis.

How do these balance out?

Personally, I think that the cost is worth it, but only if we start to see real use of the data, and more data being opened up. Given that this latter can’t happen without investment in the former, we need to make a persuasive case to continue the work, engage in the community and build tools to aggregate disparate content for the non-technical user. One lesson learnt though is that not all data is made equal, and ours is still ‘under construction’ which adds a layer of complexity when working with a moving target.

Leave a Comment

Older Posts »