Archive for February 24, 2010

Repository & the Cloud event

I have attended this event on 23/02/2010. It focused on the issues on cloud storage within repository context. People from leading repositories presented their view and development work of cloud technologies.

Michele Kimpton from DuraSpace

DuraCloud (http://www.duraspace.org/duracloud.php) has been in a pilot phase since the beginning of Fall 2009 and will be released as a service of the DuraSpace not-for-profit organization in the fall of 2010.

Key advantages of cloud:

  • Scalability
  • Remote off-campus ‘storage of digital assets’
  • Ease of implementation
  • Flexibility
  • Don’t have to staff locally
  • Cost
  • Elasticity

Major challenges of cloud:

  • Trusting third party to manage critical assets
  • Long-term reliability of solution
  • Data security
  • Performance and bandwidth concerns
  • Loss of control
  • Administrative burden of SLAs
  • Transparency of solution
  • Data lock in
  • Less customizable

Institutional needs: managing digital collections

Service areas:

  • Remote secondary storage of digital collections
  • Preservation support
    • Ability to replicate content to multi providers and locations
    • Ability to synchronize backup
    • Access to content
  • Intra-institution shared collections
  • Inter-institution shared collections
  • Compute services
  • Online primary storage

Duracloud services in the cloud for durable digital content

  • Admin
  • Service manage
  • Storage manage
    • Amazon S3
    • Rackspace Cloud files
    • EMC atmos
    • Other clouds
    • Services

Pilot projects

  • NYPL pilot: Digital Gallery collection (http://digitalgallery.nypl.org/nypldigital/explore/dgexplore.cfm)
    • Backup copy all TIFF images (10 TB)
    • Transform TIFF to JPG by using ImageMagic
    • Run J2K image server in Cloud
    • Push JPEG 2000 back into Fedora
  • BHL pilot: Find the best cost competitive solution for keeping multiple copies in multi-geographic locations. They have entire backup in the cloud for 40 TB TIFF/JPG.
  • WGBH media library and archives for videos.

Future work of Duracloud

  • Distributed
  • Collaborative
  • Web oriented
  • Open
  • interoperability

Alex D. Wade from Microsoft

‘Moving to a world where all data is linked and can be stored / analyzed in the Cloud.’

Windows Azure (http://www.microsoft.com/windowsazure/) is Microsoft Cloud platform.

Zentity Cloud Storage (http://research.microsoft.com/en-us/projects/zentity/)

OGDI SDK: ‘OGDI is open source starter kit written using C# and the .NET Framework that uses the Windows Azure Platform to expose data in Windows Azure Tables as a readonly RESTful service using the Open Data Protocol (OData) via an ASP.NET based Windows Azure web role.’ (http://ogdi.codeplex.com/)

EntityCube is a research prototype for exploring object-level search technologies, which automatically summarizes the Web for entities (such as people, locations and organizations) with a modest web presence. (http://research.microsoft.com/en-us/projects/entitycube/)

Les Carr from EPrints

EPrints 3.2 gets cloud storage support. An abstract storage layer called the EPrints Storage Controller has been implemented to utilize new ‘storage plug-ins’ to store files in different places:

  • local storage
  • cloud storage
  • hybrid storage

An XSL like language is used to describe ‘where to store’ logic.

A broadcast describing EPrints cloud is available here: http://www.slideshare.net/lescarr/eprints-and-the-cloud

Leave a Comment

Follow

Get every new post delivered to your Inbox.