Archive for February, 2010

Repository & the Cloud event

I have attended this event on 23/02/2010. It focused on the issues on cloud storage within repository context. People from leading repositories presented their view and development work of cloud technologies.

Michele Kimpton from DuraSpace

DuraCloud (http://www.duraspace.org/duracloud.php) has been in a pilot phase since the beginning of Fall 2009 and will be released as a service of the DuraSpace not-for-profit organization in the fall of 2010.

Key advantages of cloud:

  • Scalability
  • Remote off-campus ‘storage of digital assets’
  • Ease of implementation
  • Flexibility
  • Don’t have to staff locally
  • Cost
  • Elasticity

Major challenges of cloud:

  • Trusting third party to manage critical assets
  • Long-term reliability of solution
  • Data security
  • Performance and bandwidth concerns
  • Loss of control
  • Administrative burden of SLAs
  • Transparency of solution
  • Data lock in
  • Less customizable

Institutional needs: managing digital collections

Service areas:

  • Remote secondary storage of digital collections
  • Preservation support
    • Ability to replicate content to multi providers and locations
    • Ability to synchronize backup
    • Access to content
  • Intra-institution shared collections
  • Inter-institution shared collections
  • Compute services
  • Online primary storage

Duracloud services in the cloud for durable digital content

  • Admin
  • Service manage
  • Storage manage
    • Amazon S3
    • Rackspace Cloud files
    • EMC atmos
    • Other clouds
    • Services

Pilot projects

  • NYPL pilot: Digital Gallery collection (http://digitalgallery.nypl.org/nypldigital/explore/dgexplore.cfm)
    • Backup copy all TIFF images (10 TB)
    • Transform TIFF to JPG by using ImageMagic
    • Run J2K image server in Cloud
    • Push JPEG 2000 back into Fedora
  • BHL pilot: Find the best cost competitive solution for keeping multiple copies in multi-geographic locations. They have entire backup in the cloud for 40 TB TIFF/JPG.
  • WGBH media library and archives for videos.

Future work of Duracloud

  • Distributed
  • Collaborative
  • Web oriented
  • Open
  • interoperability

Alex D. Wade from Microsoft

‘Moving to a world where all data is linked and can be stored / analyzed in the Cloud.’

Windows Azure (http://www.microsoft.com/windowsazure/) is Microsoft Cloud platform.

Zentity Cloud Storage (http://research.microsoft.com/en-us/projects/zentity/)

OGDI SDK: ‘OGDI is open source starter kit written using C# and the .NET Framework that uses the Windows Azure Platform to expose data in Windows Azure Tables as a readonly RESTful service using the Open Data Protocol (OData) via an ASP.NET based Windows Azure web role.’ (http://ogdi.codeplex.com/)

EntityCube is a research prototype for exploring object-level search technologies, which automatically summarizes the Web for entities (such as people, locations and organizations) with a modest web presence. (http://research.microsoft.com/en-us/projects/entitycube/)

Les Carr from EPrints

EPrints 3.2 gets cloud storage support. An abstract storage layer called the EPrints Storage Controller has been implemented to utilize new ‘storage plug-ins’ to store files in different places:

  • local storage
  • cloud storage
  • hybrid storage

An XSL like language is used to describe ‘where to store’ logic.

A broadcast describing EPrints cloud is available here: http://www.slideshare.net/lescarr/eprints-and-the-cloud

Leave a Comment

Digitisation of Analogue Audio – JISC Digital Media training course

By Matthew Herring

I attended this course on 2nd Feb 2010. It was an introductory overview of digitising analogue audio material, with a particular focus on cassettes (although other media were discussed to some extent). I would recommend the course as a very good introduction, but anyone wanting to actually carry out a substantial audio digitisation project might want to also attend some of the more advanced/focused courses which JISC Digital Media also run (e.g. Digitising Grooved Discs and Open Reel Audio). The following is a brief summary of the main areas covered.

  • Theory of analogue audio and analogue-digital conversion, covering basic properties of sound waves; how the analogue signal is captured and stored on magnetic tape and other analogue media; and digital audio concepts such as sampling, bit depth and file compression CODECS.
  • Analogue audio formats. Various examples of more-or-less obscure audio carriers were brought out for us to identify and look at. They included a wax cylinder (sadly in pieces), a shellac record, mini and micro cassettes,  a lasar disc, recording wire, a DAT tape and various large format cassettes.
  • The analogue cassette tape, covering more specifics about the cassette, including strengths and weaknesses of the format and types of problems encountered with cassettes. Some causes of damage to audio tapes include: poor storage (temperature and humidity), age (gradual signal loss), rough handling, poor recording technique and use in a machine which has not had its playback head demagnetised. Types of damage include: sticky shed syndrome, vinegar syndrome (two types of chemical degradation due to poor storage), backcoat shedding (separation of magnetic layer from plastic tape), mould, print through (where the magnetic field from the tape ‘prints’ itself on the bit of tape which is wound next to it, resulting in an audible echo on the tape), signal loss (inevitable over time, resulting in a quieter signal) and damage from playback equipment. One useful tip is that magnetic print through can be avoided (or lessened) by winding the cassette to the end for storage. This section ended with an exercise to identify various types of damage to sample audio cassettes.
  • Noise reduction systems – important to understand these and make sure that playback equipment is set up accordingly (often says on the cassette whether these have been used; playback equipment has settings for it).
  • Threats to analogue audio collections were briefly listed, including disaster, accidental erasure, theft, obsolescence, lack of institutional support/funding and inappropriate storage. Planning for these sorts of threats needs to take place.
  • Hardware systems for digitisation. Three possible systems: analogue playback directly into portable digital recorder (good for smaller projects where portability of equipment is an advantage, but too time-consuming for larger projects); high end digitisation system, with multiple analogue/digital converters recording directly to server storage (expensive to install and needs expert users, but  gives very high quality and good for large, important projects); and a PC-based system utilising a single digital/analogue converter recording to the PC. This last system is the one we used for the classroom exercises. We were also shown how to maintain tape playing equipment (head cleaning/adjusting azimuth etc.)
  • Digital capture settings. These are:

Sample rate. Needs to be at least twice the frequency range of the recording, but 48KHz is considered a minimum for archiving. Higher rates can help reduce noise etc. (Tape has a maximum frequency range of 30Hz-20KHz).

Bit depth. 140db is the range of human hearing. Largest decibel range for professional tape is 110db. 24 bit recording gives a range of 144db, so is recommended.

File format. Always capture uncompressed. Archival standard is BWAV.

Recording level. Use peak level meter in recording software to make sure that the volume level never goes above the range of the digital system, to avoid ‘clipping’ of the signal, which results in audible distortion.

  • Digitisation. We did some digitisation of cassettes, using WaveLab software.
  • Problems with digital files. Digital files can be very high quality and capture analogue recordings accurately, but degradation of storage media can be more damaging to digital than to audio recordings. We listened to some examples of damage to digital files. Common problems include: physical degradation of media, incompatibility with playback/editing software, CODEC not installed.
  • Metadata. This was discussed at a basic level.
  • Preservation. The necessity of a preservation policy was discussed. Recommendations included: adequate file backup, migration/refreshment, existing standards (OAIS, PREMIS). Quality control standards (IASA-TC04 Guidelines on the Production and Preservation of Digital Audio Objects, 2nd edn. (2009); Sound Directions – Best Practices for Audio Preservation (2008)) were briefly discussed.
  • Delivery, covering the production of compressed surrogate copies for downloading. Lossy versus lossless compression was discussed. Factors for successful delivery include: file size, format and effective metadata. The maximum recommended compression for spoken word across the web is 128Kbps.
  • Digitisation within a project structure – the different stages of a digitisation project:

assessing collection  condition

investigating rights issues

user needs

institutional support/funding

feasibility study

in-house or outsourced?

project specifications

digitisation

delivery

preservation

  • We were also introduced to some useful pieces of software for investigating the properties of digital audio files and creating metadata  (MP3 IDE tags) – I think most/all of these are open source or free on the internet:

MD5Checker – generating/checking checksums

Gspot – audio analysis. Useful for checking file types/CODECS used

MediaInfo – audio analysis and metadata

Diamond MD5 – audio analysis/verifying file integrity

iTunes – metadata

Foobar 2000 – metadata

Switch – converting WAV to MP3

Comments (2)

Follow

Get every new post delivered to your Inbox.