Archive for repositories

Repository & the Cloud event

I have attended this event on 23/02/2010. It focused on the issues on cloud storage within repository context. People from leading repositories presented their view and development work of cloud technologies.

Michele Kimpton from DuraSpace

DuraCloud (http://www.duraspace.org/duracloud.php) has been in a pilot phase since the beginning of Fall 2009 and will be released as a service of the DuraSpace not-for-profit organization in the fall of 2010.

Key advantages of cloud:

  • Scalability
  • Remote off-campus ‘storage of digital assets’
  • Ease of implementation
  • Flexibility
  • Don’t have to staff locally
  • Cost
  • Elasticity

Major challenges of cloud:

  • Trusting third party to manage critical assets
  • Long-term reliability of solution
  • Data security
  • Performance and bandwidth concerns
  • Loss of control
  • Administrative burden of SLAs
  • Transparency of solution
  • Data lock in
  • Less customizable

Institutional needs: managing digital collections

Service areas:

  • Remote secondary storage of digital collections
  • Preservation support
    • Ability to replicate content to multi providers and locations
    • Ability to synchronize backup
    • Access to content
  • Intra-institution shared collections
  • Inter-institution shared collections
  • Compute services
  • Online primary storage

Duracloud services in the cloud for durable digital content

  • Admin
  • Service manage
  • Storage manage
    • Amazon S3
    • Rackspace Cloud files
    • EMC atmos
    • Other clouds
    • Services

Pilot projects

  • NYPL pilot: Digital Gallery collection (http://digitalgallery.nypl.org/nypldigital/explore/dgexplore.cfm)
    • Backup copy all TIFF images (10 TB)
    • Transform TIFF to JPG by using ImageMagic
    • Run J2K image server in Cloud
    • Push JPEG 2000 back into Fedora
  • BHL pilot: Find the best cost competitive solution for keeping multiple copies in multi-geographic locations. They have entire backup in the cloud for 40 TB TIFF/JPG.
  • WGBH media library and archives for videos.

Future work of Duracloud

  • Distributed
  • Collaborative
  • Web oriented
  • Open
  • interoperability

Alex D. Wade from Microsoft

‘Moving to a world where all data is linked and can be stored / analyzed in the Cloud.’

Windows Azure (http://www.microsoft.com/windowsazure/) is Microsoft Cloud platform.

Zentity Cloud Storage (http://research.microsoft.com/en-us/projects/zentity/)

OGDI SDK: ‘OGDI is open source starter kit written using C# and the .NET Framework that uses the Windows Azure Platform to expose data in Windows Azure Tables as a readonly RESTful service using the Open Data Protocol (OData) via an ASP.NET based Windows Azure web role.’ (http://ogdi.codeplex.com/)

EntityCube is a research prototype for exploring object-level search technologies, which automatically summarizes the Web for entities (such as people, locations and organizations) with a modest web presence. (http://research.microsoft.com/en-us/projects/entitycube/)

Les Carr from EPrints

EPrints 3.2 gets cloud storage support. An abstract storage layer called the EPrints Storage Controller has been implemented to utilize new ‘storage plug-ins’ to store files in different places:

  • local storage
  • cloud storage
  • hybrid storage

An XSL like language is used to describe ‘where to store’ logic.

A broadcast describing EPrints cloud is available here: http://www.slideshare.net/lescarr/eprints-and-the-cloud

Leave a Comment

Spreading the word about YODL

After a year of developing and implementing the pilot phase of YODL, the york online multimedia repository, we feel we have useful experience to share with other people working in this area. In July Julie Allinson and Elizabeth Harbord published an article in Ariadne, a Web magazine aimed at information professionals in archives, libraries and museums . The article outlined the progress of the project from its inception through to its actual implementation. (http://www.ariadne.ac.uk/issue60/allinson-harbord/). To accompany this, Julie suggested Frank Feng and I write a companion article from a technical perspective. I found this a slightly daunting but very interesting challenge, and the article we jointly wrote is now online. The article can be read at http://www.ariadne.ac.uk/issue61/stracchino-feng/

Leave a Comment

York-Leeds Exchange of Experience

Notes from an exchange of experience meeting, hosted by the University of York Library and Archives, 24 April 2009, involving York University, Leeds University and York Saint John University.

Present: Jonathan Ainsworth (Leeds), Julie Allinson (York), Michael Emly (Leeds), Frank Feng (York), Matthew Herring (York), Rachel Proudfoot (WRRO), Lauren Shipley (York Saint John), Peri Stracchino (York).

YODL (York Digital Library) – Peri Stracchino

Peri demonstrated YODL, including search and browse functions, resource displays and resource submission workflow. YODL runs on Fedora software, which is extremely flexible and powerful, but only comes with a very basic user interface. YODL is currently using Muradora as its interface software. This was developed especially as a ‘front end’ to Fedora, and has been a useful way of getting a user interface implemented in the short-term.

YODL is now in a semi-live beta testing phase, with 3750 images in it (mostly from the History of Art Department). The team has built a tool (based on Xforms) to create and submit VRA Core 4 image metadata, with auto-suggest functionality or drop-down menus for most fields. LCSH and Getty vocabularies are used. Before the system can go fully live, sufficient access control mechanisms need to be built to satisfy the terms of the CLA license (under which much of the material is created), which is a major challenge. The team has recently carried out the first phase of user testing with academics from the History of Art Department.

York Saint John – Lauren Shipley

YSJ has a one-year-old program to create an open access multimedia repository to store material created as part of university projects and to support teaching. Lauren is the only member of staff working on the project and she works one day a week on it. The first one-year phase was funded by JISC and has focussed on multimedia content, especially from a project called C4C (‘Collaborating for Creativity’). The next phase will be to include etheses and research publications.

The project is live and uses ArchivalWare software. Lauren demonstrated both the public interface (including video clips of student performances) and the metadata creation workflow of ArchivalWare (using Dublin Core). The lack of time dedicated to the project is a barrier (especially as Lauren does not have time to create and ingest metadata records herself and relies on content contributors to do this) and much of her time is spent doing advocacy work for the project. Nevertheless a lot has been achieved for such a project.

LUDOS (Leeds University) – Jonathan Ainsworth

New developments:

• Thousands of digital images and metadata from the library’s special collections added. Some material is restricted as it is in copyright (it was originally created under fair dealing for individual academics and can now only be viewed and managed by LUDOS staff).
• Handle system implemented to create unique permanent IDs for objects, which can be resolved with a global resolution service.
• EAD finding aids added to LUDOS for archival collections
• Forms for user self-submission introduced

Jonathan also demonstrated some features of Digitool, including how objects can appear in more than one collection, based on search terms added to the metadata.

WRRO – Rachel Proudfoot

• Survey of researchers reported than awareness of open access is low and that the idea of institutional repositories is not a meaningful one within academic departments. Researchers do not understand the challenge of cultural changes.
• Advocacy – address awareness with PhD and early career researchers.
• REF is relevant – there is an increasing interest in citation impact (though the evidence is mixed regarding the impact of open access).
• Workflow – capturing the correct version of a paper is an issue. The optimum is at the point of publication, but also good to capture early ‘from the desktop’.
• Symplectic software used to harvest metadata about Leeds publications from abstracting and indexing services – this can later be matched up with papers deposited to WRRO earlier in their publication processes.
• How to capture grant data – Symplectic?
• Promoting self-archiving (this is more scalable than having dedicated staff to do ingesting/metadata creation). This is a big challenge, however. Centralised copyright checking will continue (some academics are worried about breaking copyright; others are not concerned enough).
• Investigating ways of importing metadata from various sources (e.g. from departmental and personal websites), but this is of variable quality. However, it is better than having no metadata.
• Access control – this is difficult as the three institutions (Leeds, York, Sheffield) all use different systems.
• Etheses – new project to add these to WRRO.

SWORD – Julie Allinson

SWORD is a standard for submitting content to repositories, which itself uses the Atom publishing protocol with an extension for metadata. The basic concept is to allow content creators to submit content to repositories easily and potentially to more than one repository with a single operation. There are both web and desktop clients for doing this. Both are based on the concept of a transaction with the client requesting a service document from the repository and receiving a receipt after their submission.

Discussion

Topics discussed included:
• Metadata – the problems of describing complex resources, such as the Michael Nyman archive acquired by Leeds, using standards such as Dublin Core. YODL’s use of VRA and the flexibility of open source software, such as Fedora, to build bespoke metadata creation workflows provides one solution, but with the downside of development costs.
• Advocacy – achieving institution-wide awareness is very difficult, but crucial to success. Need to work to build basic awareness as widely as possible and to collaborate closely with particular individuals who have content to contribute.

Comments (2)

YODL available again

the upgrade on YODL is now complete, and the system is available for use again

Leave a Comment

YODL downtime thurs 16th april 2009

The digital repository will be going off line thursday afternoon for essential maintenance work.  It may reappear for brief periods during the afternoon, this will be to permit administrator testing.  Please do not try to use it during this period,  the system should be available again  on friday . There will be an update to this  blog  to confirm when work is complete.

Leave a Comment

1) “lost and found” – more images now on YODL, 2) upgrade planned for thursdaY

We now have almost 3000 images uploaded to YODL (YOrk Digital Library). Approximately 300 newly scanned images have been added to the existing images in the History of Art collection. We have also just completed an exercise to locate and upload another three hundred or so image files which could not be included in the initial bulk upload due to mistyped file names in the original records. These have been now been hand edited, corrected and uploaded to YODL. Painstaking and fiddly work, but satisfying to get it done! We have also carried out testing of a neccesary upgrade (from Fedora 2.2.3 to Fedora 2.2.4 ) which should make the system more robust as well as fixing a known bug, and plan to apply this to the live system later this week.

Leave a Comment

Older Posts »
Follow

Get every new post delivered to your Inbox.