I have attended this event on 23/02/2010. It focused on the issues on cloud storage within repository context. People from leading repositories presented their view and development work of cloud technologies.
Michele Kimpton from DuraSpace
DuraCloud (http://www.duraspace.org/duracloud.php) has been in a pilot phase since the beginning of Fall 2009 and will be released as a service of the DuraSpace not-for-profit organization in the fall of 2010.
Key advantages of cloud:
- Scalability
- Remote off-campus ‘storage of digital assets’
- Ease of implementation
- Flexibility
- Don’t have to staff locally
- Cost
- Elasticity
Major challenges of cloud:
- Trusting third party to manage critical assets
- Long-term reliability of solution
- Data security
- Performance and bandwidth concerns
- Loss of control
- Administrative burden of SLAs
- Transparency of solution
- Data lock in
- Less customizable
Institutional needs: managing digital collections
Service areas:
- Remote secondary storage of digital collections
- Preservation support
- Ability to replicate content to multi providers and locations
- Ability to synchronize backup
- Access to content
- Intra-institution shared collections
- Inter-institution shared collections
- Compute services
- Online primary storage
Duracloud services in the cloud for durable digital content
- Admin
- Service manage
- Storage manage
- Amazon S3
- Rackspace Cloud files
- EMC atmos
- Other clouds
- Services
Pilot projects
- NYPL pilot: Digital Gallery collection (http://digitalgallery.nypl.org/nypldigital/explore/dgexplore.cfm)
- Backup copy all TIFF images (10 TB)
- Transform TIFF to JPG by using ImageMagic
- Run J2K image server in Cloud
- Push JPEG 2000 back into Fedora
- BHL pilot: Find the best cost competitive solution for keeping multiple copies in multi-geographic locations. They have entire backup in the cloud for 40 TB TIFF/JPG.
- WGBH media library and archives for videos.
Future work of Duracloud
- Distributed
- Collaborative
- Web oriented
- Open
- interoperability
Alex D. Wade from Microsoft
‘Moving to a world where all data is linked and can be stored / analyzed in the Cloud.’
Windows Azure (http://www.microsoft.com/windowsazure/) is Microsoft Cloud platform.
Zentity Cloud Storage (http://research.microsoft.com/en-us/projects/zentity/)
OGDI SDK: ‘OGDI is open source starter kit written using C# and the .NET Framework that uses the Windows Azure Platform to expose data in Windows Azure Tables as a readonly RESTful service using the Open Data Protocol (OData) via an ASP.NET based Windows Azure web role.’ (http://ogdi.codeplex.com/)
EntityCube is a research prototype for exploring object-level search technologies, which automatically summarizes the Web for entities (such as people, locations and organizations) with a modest web presence. (http://research.microsoft.com/en-us/projects/entitycube/)
Les Carr from EPrints
EPrints 3.2 gets cloud storage support. An abstract storage layer called the EPrints Storage Controller has been implemented to utilize new ‘storage plug-ins’ to store files in different places:
- local storage
- cloud storage
- hybrid storage
An XSL like language is used to describe ‘where to store’ logic.
A broadcast describing EPrints cloud is available here: http://www.slideshare.net/lescarr/eprints-and-the-cloud