Digital Preservation and sustainability

Over the past few months, there’s been a couple of interesting events in the realm of digital preservation. The first was the publication of the new UNESCO digital preservation guidelines – PERSIST (although UNESCO uses the term sustainability rather than preservation) . The second was the updated Digital Preservation Coalition Handbook.

PERSIST (Platform to enhance the sustainability of the information society transglobally) looks at guidelines for selecting digital materials – it’s necessarily rather broad and full of good intentions and motherhood statements. The guidelines look at national institutions, such as archives, museums and libraries, and suggests that where legislation exists regarding the deposit of materials such legislation should be broadened, if required, to include digital content. Both national and international bodies should be engaged in setting standards for the collection and maintenance of these materials. Copyright and digital rights management are briefly addressed in the next section on the legislative environment.

The next three sections look at libraries, archives and museum collections from the ‘think global: act local’ perpective. The first section, Thinking globally, suggests that libraries, faced with the ubiquity of social media, websites and internet content, will need to manage their legal deposit and selection criteria for ephemera carefully. It also suggests that libraries may need to focus on user requirements for maintaining content, rather than continually acquiring new content with view to preservation. Museums and galleries are flagged as needing to think about metadata for digital and digitised content and also for records about the collection. Archives, like libraries, face problems with shifting formats and systems. Libraries have the luxury of many copies, but archives may lose content that is not ‘born archival’ but which garners significance over time, simply because the formats in which the items are created are in themselves, ephemeral. Although specific issues are identified for each institutional type, the guidelines stress that many of these concerns cross collection boundaries.

The second section, Act locally, provides a range of selection techniques and criteria which are probably already familiar to institutions looking at collection policies and processes: comprehensive collections, focused on a region, time or person/organisation; representative sampling; criteria based selection – format, topic, and so on. It also suggests that there can be delayed appraisal in some circumstances: collect now, select later.

In addition, the guidelines provide a simple decision tree (sadly, not illustrated) which suggests institutions consider the following:

  • Identify
  • Legislative framework
  • Select
    • significance
    • sustainability
    • availability
  • Decide

Possibly of more interest and more utility are the appendices – the first looks at metadata for digital preservation, and manages to do so without using the PREMIS acronym. Three types of metadata are identified as useful for digital preservation; structural, descriptive and administrative. The second appendix provides useful terms and definitions.

The Digital Preservation Coalition Handbook is an online document (which can also be saved and printed as pdf), designed for managers and executives who are either new to the concepts of digital preservation or, through the handbook  and other learning, feel that they have a good grasp of the essentials but are by no means experts. Each section states the level of experience the section is aimed at, and provides some clear, simple discussions before going on to more nuts and bolts information like choosing providers, identifying formats, working through digitisation processes and decisions and more.  This is a far more detail and practical work than the Guidelines, but the two work well together.

Use the Guidelines to promote the importance of digital management and then follow up with the Handbook.




Archives New Zealand – 2057

The National Archives New Zealand have just released their new long term vision (which seems appropriate for an archives) for comment. It starts with a stirring quote from Sir Arthur Doughty about the value of archives for posterity, which can be taken as something of a trumpet call in this fiscally challenging times.

You, and I, have until 4 November to respond (the day after we are blowing up Parliament, after all).

A draft standard, Egad!

ICA 2016 is about begin, and as noted in my post a few months ago, a new draft descriptive standard is now available for review –  Looking forward to hearing more about it, and all the presentations via the #ICASeoul16 hashtag. (My very rudimentary french and italian is getting a workout already!)

Stop press: a new email list has been set up for comment on the standard –


Reflecting on #govHack

A fortnight ago, I gave up a little bit of time to see if I could engage hackers in using cultural heritage data, either to enhance a project or to be the basis for one.

This year’s#govHackWA was held in a new space, and included a link to a regional centre, Geraldton. After four years, it has become far more slick and professional, which was needed with the large number of entrants, but meant that some of the more social components of the weekend had gone by the wayside (the introduction and welcome from the central committee sounded more like phoning a government organisation with a long phone menu, than the somewhat quirky presentation by @pia_waugh of earlier years).  We shared information via Slack, an internet relay chat system with pretensions of grandeur, and the data sets needed to be on the various data portals a week ahead of the competition (rather than on a thumbdrive or harddrive brought in at the last minute).

The Slack channels worked well, enabling information, advice and requests to be shared with a large or small group as required. I have some concerns about these sorts of channels for more formal communication, particularly from a government recordkeeping perspective, but it was an effective tool for a specific project. There was a specific channel for project ideas, so I was able to suggest a few things, one of which, I think, was incorporated into the ihero project, about facial recognition of WWI photographs.

The data portals are clearly identified on the various government websites, with a link to each state from the Commonwealth portal, which shows how data can be connected across jurisdictions. However, I found the quality of the datasets to be variable, and I do wonder how many of them have longevity or usefulness either because of the specificity of the data collected, or the format in which the data is presented (but this is a discussion for another day). Nevertheless, by searching keywords in the data portals I was able to identify a range of useful data sets, and also links to databases, which provide more complex data.  I collated some DATASETS and sources and also printed off my previous post on some #govHack tools.

I was able to help two groups with identifying data and suggesting some ways of working with the data that they had – colourfulpast and ihero.  I had more involvement with the colourfulpast team, because they had worked with cultural data in the past and they included a colleague from the State Library of WA, but it was great to see how both projects evolved over the course of the weekend. I was able to promote both projects via twitter and on relevant facebook groups after the event, so that the target audience could identify and work with the projects and, hopefully, provide feedback and vote!

That said, there are some things that I would do differently next time.  The WA Fisheries Department were there all weekend, with just one dataset – their shark data. Their ability to work with multiple groups and to provide both data and technical expertise meant that three groups elected to work primarily with their data. Had I been more switched on, I could have had a look at the WA Museum and SRO trial discovery layer which Andrew brought to the weekend and identified additional shark data. Similarly, working with Trove to develop some complementary data might also have been useful for them.  The teams are time poor, so helping by providing some easily used and pre-collated data is worth considering. And, I would work to have some specific datasets identified in the portal, which I was really familiar with.

Next year, I hope to return to GovHack with a fully working SROWA catalogue and some datasets derived from the collection. I’ll also have a look at the other data provided by cultural organisations, and work on identifying projects and problems with them.  Having specific datasets and clearly identified projects is of benefit to both the organisations and the hackers.

Survey for Volunteers in Australian Archives

A great initiative! Well done to the team.

Personal Recordkeeping, Identity and Archives

You can all stop holding your breath, as we are now launching the survey which was proposed at the 2015 ASA conference! *collective intake of breath*

We can’t wait for you to pass it on to your volunteers in order to gain a better insight into their collective experiences and motivations, knowing that this knowledge would help to improve your volunteer program.

Volunteers within the Australian archival/records sector are invited to complete the following survey:

take the survey

Please note: Volunteers are not asked to identify which institution they volunteer for, and their involvement and responses will be kept private and anonymous.

More details

This study will address the following research questions:

1. Who are our volunteers?
2. What motivates our volunteers?
3. What type of experiences/support does the Australian archival community offer?

An understanding of the above will assist in improving the experience of volunteers within the archives, resulting in the creation of…

View original post 114 more words

Sticky fingers, or: do we need to revisit the gloves debate?

For quite some time, archivists, conservators and special collections staff have been telling people that they don’t need gloves to handle paper records. The wonderful Rebecca Goldman (@derangedescribe) even did a handy (pun intended) flow chart

Last night, at the Australian Society of Archivists WA Branch AGM, we learnt that the matter needs a lot more research. Professor Simon Lewis, of Curtin University, and his research students are involved in forensic chemistry research, and are looking more closely at paper and fingerprints, to see what they can determine.

Paper porosity is ideal for capturing some drugs; American dollars show up cocaine traces quite well, apparently.  Paper is also evidence itself, or rather a carrier of different sort of information, as archivists well know. In the forensics field, there has been a concentration on the authenticity of documents used to prove identity – a passport may well be authentic, but there may be questions about the documents used to obtain it, for example. Paper is also used as a carrier for some cost effective medical analytical tests.  Because of this, there is an increasing focus on paper as an area of research. Can they date paper, for example, to say when a document was created (turns out the answer, is , umm, not really, or, it’s quite tricky).

Paper responds to particular events in interesting ways. Bleaching and laser ablation to remove stains or colour leads to weakening of paper fibres. Light also changes paper, as we know. But there may be other things going on, within the paper. An Indiana based art museum identified a set of artworks created by Gustave Baumann, which they have in their collection. Baumann is known to have used turquoise inks to sign prints and artwork. Because they took photos of their collection when it was accessioned, they knew they had some turquoise signatures. However, when they went to retrieve the art for a display, despite being stored in the dark for a significant number of years, the ink signatures had disappeared. Something in the paper may have been interacting with the ink.

There’s been a bit of research into rag based papers and even early wood pulp papers, but not a lot, for example, on recycled papers. Simon and his team have recently received paper samples from the Shoalhaven paper mill when it closed, going back 50 years. The paper is well described and its storage conditions are known. This means that they can start looking at some different experiments with paper.

But they also need to find out about the things that interact with paper, like the turquoise inks, and those fingerprints. While they could find quite a lot of research on fingerprints, they discovered a bit of a gap in the literature – the way in which fingerprints interact with and affect or affected by paper. Indeed, when they started to look into it, they found that most of the material on the issue had been written by archivists, librarians and conservators, and about handling issues for cultural heritage materials. Suddenly, their research took on a whole new aspect.

Professor Lewis dates the gloves controversy to a 2005 paper by Baker and Silverman,  Misperceptions about white gloves. In the paper, it was argued that the majority of fingerprint residue was water, so little amino acid or fats remained to contaminate the paper. But it turns out, that is not strictly accurate.

National Archives of Australia, senior conservator, Prue McKay wrote about her experiments with paper and gloves in 2008. She found that bare hands did leave residue, but there was some doubt as to the effect of the marks, particularly on older papers. More recently, Terry Kent, a UK based forensics analyst, reported on the water content of fingerprints, again confirming that there were sufficient amino acids and fats to make a deposit. Apparently, tests conducted at Curtin show that amino acids migrate into the paper substrate and then bind to the paper. They are doing some continuing work to see how long fingerprint amino acids remain and, eventually, will try and find out what the effects are on the paper. They also looked at the fats, which can be both from secretions but also from things like soap, gels and hand creams.

The main result from the experiments so far show that fats and acids return to the skin very quickly, with around 5 minutes, after hand washing. However, the jury is still out on whether or not gloves should or should not be worn. Based on the research to date, I’m sticking with the no gloves policy until the other alternatives are fully investigated, although, if I know someone is a head scratcher or finger licker, I may reconsider.



Getting ready for #govHack 2 – tools, other data sources and examples

In this post, I’m going to point to some of the tools that I know from digital humanities and the like. They are mostly used in the cultural sphere, but that is not to say that they aren’t useful for exposing and manipulating other sorts of data. I’ll also try and provide some examples of the way data has been used for some simple and not so simple projects. GovHack is all about getting something up and running in 24 hours so, like a thesis, the parameters of time, space and subject need to be clearly defined. However, also like a thesis, the project should show some potential for further work, research and avenues for publication.

I’ve already provided a link to the TROVE API, and to some of the blogs that discuss using it.  The API has been acknowledged as a source of inspiration for the Europeana and Digital Public Library of America (DPLA) APIs, too ( a good way of incorporating some international data) :;;  Library cataloguing data, including Australian libraries, can be found on WorldCat, while archival and manuscript collections can be found via ArchivesGrid

Libraries and some archives use a format called MARC (MAchine Readable Catalogue) to describe resources. It’s a standard developed by the Library of Congress, and about half way down their MARC documentation page, you’ll find a list of crosswalks and mappings to other formats including Dublin Core (developed by OCLC, the people who run WorldCat) and geospatial data –

Other archives use Encoded Archival Description (EAD) and Encoded Archival Context (EAC) to create and share descriptions. Developed independently, the Library of Congress also maintains documentation to support these standards, and again has some crosswalks EAC is used by the SNAC Project and the eScholarship Research Centre at University of Melbourne (which is a data provider for ANDS) to create connections between organisations and individuals –;

Beyond the world of library and archives description (and you just wanted some simple headers to capture data, right?), there is Zotero, an open source citation software developed by the Roy Rosenzweig Centre for History and New Media (CHNM) – . Zotero comes with some nice tools, including a simple timeline, and is also something I’d like to play with to open up referencing from archival sources. The CHNM spends a lot of time creating neat tools for historians and cultural curation so they also have Omeka, an online exhibition tool, and Scripto for transcription purposes –

You can also use the open source project  Blacklight (including Spotlight) to play with library described data –; (Turns out Blacklight, Spotlight and other delights are the work of Stanford University Libraries

There’s some good tutorials on Zotero and other tools on the Programming Historian site –

The ever fabulous and creative Tim Sherratt has a whole host of tools, and examples of how to use them, on his wraggelabs site. The focus is on TROVE and the National Archives of Australia – e.g.  and

Finally, I’d like to point to some interesting uses of cultural data, both as part of govHack and more generally.

Not open source, but fun, there’s HistoryPin and NowandThen and Pixstory, from the 2013 Govhack, explored some of these ideas –

As part of the WW1 centenary project, the RSL teamed up with a local TAFE to create a virtual ‘Digger’ app –

Last year, at least two projects used cultural data for govhack –

And, there are all those geospatial projects, e.g.