Getting ready for #govHack 2 – tools, other data sources and examples

In this post, I’m going to point to some of the tools that I know from digital humanities and the like. They are mostly used in the cultural sphere, but that is not to say that they aren’t useful for exposing and manipulating other sorts of data. I’ll also try and provide some examples of the way data has been used for some simple and not so simple projects. GovHack is all about getting something up and running in 24 hours so, like a thesis, the parameters of time, space and subject need to be clearly defined. However, also like a thesis, the project should show some potential for further work, research and avenues for publication.

I’ve already provided a link to the TROVE API, and to some of the blogs that discuss using it.  The API has been acknowledged as a source of inspiration for the Europeana and Digital Public Library of America (DPLA) APIs, too ( a good way of incorporating some international data) :;;  Library cataloguing data, including Australian libraries, can be found on WorldCat, while archival and manuscript collections can be found via ArchivesGrid

Libraries and some archives use a format called MARC (MAchine Readable Catalogue) to describe resources. It’s a standard developed by the Library of Congress, and about half way down their MARC documentation page, you’ll find a list of crosswalks and mappings to other formats including Dublin Core (developed by OCLC, the people who run WorldCat) and geospatial data –

Other archives use Encoded Archival Description (EAD) and Encoded Archival Context (EAC) to create and share descriptions. Developed independently, the Library of Congress also maintains documentation to support these standards, and again has some crosswalks EAC is used by the SNAC Project and the eScholarship Research Centre at University of Melbourne (which is a data provider for ANDS) to create connections between organisations and individuals –;

Beyond the world of library and archives description (and you just wanted some simple headers to capture data, right?), there is Zotero, an open source citation software developed by the Roy Rosenzweig Centre for History and New Media (CHNM) – . Zotero comes with some nice tools, including a simple timeline, and is also something I’d like to play with to open up referencing from archival sources. The CHNM spends a lot of time creating neat tools for historians and cultural curation so they also have Omeka, an online exhibition tool, and Scripto for transcription purposes –

You can also use the open source project  Blacklight (including Spotlight) to play with library described data –; (Turns out Blacklight, Spotlight and other delights are the work of Stanford University Libraries

There’s some good tutorials on Zotero and other tools on the Programming Historian site –

The ever fabulous and creative Tim Sherratt has a whole host of tools, and examples of how to use them, on his wraggelabs site. The focus is on TROVE and the National Archives of Australia – e.g.  and

Finally, I’d like to point to some interesting uses of cultural data, both as part of govHack and more generally.

Not open source, but fun, there’s HistoryPin and NowandThen and Pixstory, from the 2013 Govhack, explored some of these ideas –

As part of the WW1 centenary project, the RSL teamed up with a local TAFE to create a virtual ‘Digger’ app –

Last year, at least two projects used cultural data for govhack –

And, there are all those geospatial projects, e.g.












Getting ready for #govHack – cultural datasets

Next week, the largest hackathon in the world, GovHack, takes place in Australia and New Zealand. There are govhack sites at universities and regional centres, and in all the major centres. Each site has participants, who make the things, and mentors who provide advice and guidance on tools and datasets. There’s even a specific cultural hack node in Canberra, run by Tim Sherratt.

This year, I’ve signed up as a data mentor for WA, which has a central city site and a regional node at Geraldton. This will be my third year as a data mentor, and my first year as a general mentor, talking cultural data generally (mostly archival, of course) rather than representing the SROWA. It’s a lot more organised than I anticipated, and people are already asking for more information to help them prepare. To this end, I’m going to use this post to talk about some datasets.  Participants need to use at least one official data set, but can then also look for other data that they can mash together or reuse. This way, I can print off the page as a guide, and provide a link to it for #govhackwa participants. I’ll do an additional post or two on some tools for analysing them and provide some examples of how data has been used.

Official cultural datasets

These datasets are taken from the various government data portals.

Searching the dataset reveals 144 datasets for the keyword ‘library’, 117 for ‘archive’, 52 for ‘museum’ and 129 for ‘cultural’. The latter includes some gis datasets, including the “coarse cultural topographical data”, showing where major population centres are and the CPI index. In addition to collection links and collection subsets, State and National Libraries have contributed statistical datasets relating to location of libraries, user statistics and so on.

My top picks, outside of TROVE (from the National Library) and ANDS (Australian National Data Service) are –

The National Portrait Gallery

The Antarctic artefacts bibliography and Commonwealth Bay artefacts survey data –

Indigenous protected areas

The National Archives of Australia“Memory of a nation” – digitised content from online exhibitions –  and the Commonwealth Agencies dataset, which provides a comprehensive set of federal government departments, ministeries, offices and so on. Because of the way archives link data, some state and local government agencies are also included. This dataset was last updated in April, 2016 –

The State Records Office of New South Wales has a number of indexes available as csv files in the NSW data portal – including convicts, soldier settlement indexes and wills and probate, not to mention their Flickr dataset. SRNSW collection information can also be searched via their online catalogue. Queensland State Archives has 55 datasets in the portal State Records South Australia has 5 datasets

The Powerhouse Museum API

Other museum datasets include the gorgeous Scott Sisters collection from the Australian Museum, itself the subject of a remix competition in 2013/2014

There’s a plethora of WW1 related datasets – searching for ‘World War’ returns 24 datasets, of which only two are not clearly related, and the majority of which are from State Libraries.

It’s worth remembering that data in TROVE is harvested from all public libraries, and includes data from museums and archives. The content can be filtered via the TROVE API. The Public Records Office of Victoria and the Australian National University and Noel Butlin Archives have both contributed data to TROVE. The State Library of Queensland not only has data in TROVE, but also contributed over 50,000 photographs to Wikipedia.

TROVE has some useful examples and help sheets –

The Australian National Data Service is similarly rich and complex. Again, the Public Records Office of Victoria (PROV) has contributed data to ANDS, along with the State Records Office of NSW. The PROV’s semantic wiki is available as an xml formatted download –

Postcript – I’ve just been advised that the Curtin Library has made weather observation data from Jon Sanders’ 1986 – 1988 circumnavigation available through ANDS –!/rows=15/sort=list_title_sort%20asc/class=collection/q=jon%20sanders/p=1/group=Curtin%20University/. There’s also a nice blog – –  and you can follow along on Twitter #triplesolo #noonsummary.

Weather afficionados may also be interested in the digitised daily observations from colonial Perth, now in the NAA collection –


Finally, in the WA datasets, you will find a range of historical maps and plans, taken from the State Records Office digitised collection – each map links to the series at the top, but there are some older links to the previous catalogue. For better searching and exporting of data, it’s best to go straight to the new catalogue

WA theme parks – taken from the Landgate “locate the 80s’ site –

State Heritage Office datasets –

WA Maritime Archaeology datasets, provided by the WA Museum


In which there are too many hashtags, again!

Barely had the American Library Association (#alaac2016) conference finished, when I became aware of a groundswell of European conferences and workshops.

The first to pop up was #DAMEU for, obviously, Digital Asset Management. This conference is of interest because of both its focus on how to manage current digital content (whether or not it is a copy of an analogue record) but also its focus on long term management and preservation. Formats, platforms, repositories – all the buzz words are there.

The second is #LIBER16,taking place in Helsinki, for European research libraries. There’s quite a bit of overlap with the #DAMEU conference, but unlike the ones in Hobart last week, I don’t think participants could run from one to the other. Once again, open access, data repositories, and the management and maintenance of data is being discussed.

And finally, for today at least, is #eu2016nl (#eunl2016), a conference held in the final week of the Dutch Presidency of the EU, and focusing on the ‘digitalisation’ of cultural heritage (their words and spelling, not mine). This looks at both digital platforms, but also digitisation programs and linked content through the mighty Europeana. The best quote so far is that it is time to focus on quality not quantity.



Serendipity or design? #dha2016

Over in Tasmania, at the Digital Humanities conference today, there was a panel discussion on GLAM and humanities research and access to collections. @mikejonesmelb and others tweeted some of the content, and I’d love to see some of the papers and presentations.

The focus was, of course, the relationship between GLAM bodies and academia, with some suggestions for collaboration, such as the McCoy Project between University of Melbourne and Victoria Collections, and having LIS students help with digital humanities projects.  It was identified that libraries and archives are not generally identified as research institutions (although with the changes to ARC funding a few years ago, I think the larger institutions can now partner with academics?), and that generally, funding is not that available for research within collections as part of the institutions’ roles.

Digitisation was also discussed with mixed feelings. It’s one way of providing data, but as Janet Carding, one of the panellists said, “the role for GLAM institutions isn’t to shut themselves in a room with a flatbed scanner for the next 20 years …”.  It was also suggested that APIs for collections need to be made more open and accessible for users. I think there may be some more general discussion that needs to occur vis a vis collections data and the ‘ordinary punter’ as one of the panellists put it. The discussion appeared to range over the ways in which libraries and archives make information available about their collections (which is their raison d’etre) while galleries and museums have been much slower to enable access to collection databases. There are also the dichotomies between science and cultural heritage collections to be considered.

Mike Jones then spoke about context and connections, suggesting a web of knowledge lies within archival descriptions, and considering ways in which meaning can be layered over time. Deb Verhoeven followed up with a discussion of HuNI and serendipity, to which she later provided a three minute summary link. Aimed at academic researchers it still leaves lots to think about with regards to the ways in which we make connections across collections for all researchers.

Humanities research, data repositories and archives

A few months ago, in In which there are too many hashtags, I included a link to the new archives and records group at the Research Data Alliance. Research data is primarily found in Universities and research organisations like CSIRO, Universities, research hospitals and so on. Librarians have been actively involved in the space, because often the research outputs wind up in their scholars centres and so on, so it has clearly made sense to involve them in the creation of the data as well (the library continuum?).

But these data sets are also records, evidence of actions and transactions. So the archives and records profession needs to be involved too. However, our voices are often lacking.

Today, at the Digital Humanities conference in Tasmania, two documents were established that look at how humanities research and research data can be supported, managed and accessed – and

I’ve added my little bit, and I’d like to encourage you to engage with the ideas presented as well.

Some of the challenges for archivists are identified in a great blog from the University of York.

(and I’ll rant about ‘grey literature’ another day, because the cats are about to eat me).

Collections and connections

Today I learnt that the State Library of Victoria is attempting to raise $100,000 to buy the diary of Lt. Dabney M. Scales, a member of the US Confederate Navy, who later went on to become a member of the Tenessee legislature and also to serve with the US Navy in the Spanish American War. Scales’ diary is of interest to the Library because he served on the CSS Shenandoah, the last Confederate ship to surrender in the American Civil War, when it visited Melbourne during a year of piracy and predation on the New England whaling fleet. I have not been able to find the official records of the Shenandoah itself, but I’d expect that they would exist in the National Archives and Records Administration.

Not only does the Scales’ diary mention Melbourne, but it also provides details of the Shenandoah‘s activities as a privateer, and he writes movingly of the depression and confusion they felt when it was confirmed that the war was over:

Upon arriving at Liverpool in November 1865, Dabney Scales writes on November 6th – “The (British) pilot boarded us in the mid watch this morning. His news confirms that given us by the “Barracouta” – the downfall of the Southern Confederacy. The war, he said had been over so long that people had forgotten all about it” (Case Antiques, lot 176)

While the link with Melbourne is an interesting one, and Scales’s diary apparently provides details not evident in the newspapers of the day, the desire to purchase it for Melbourne leads to me ask questions about appraisal and provenance. One of the exercises I set for my students is to look at the way collections have been obtained by archives and libraries. I get them to consider the method of acquisition – whether by donation, sale, transfer or some other mechanism – and also ask them to think about whether that library or archive is the best place for the collection, and if there are any other collections which might be appropriate and if there are ways of sharing or connecting.

From a manuscript collection perspective, I can understand the attraction. It’s an outsider’s view of the city, under very difficult circumstances. The government of the day had qualms about allowing the vessel to dock, and the officers and crew enjoyed a certain notoriety while in Melbourne. 18 men deserted, but a further 40 men were taken on board and became members of the crew.

From the archival perspective, I feel uncomfortable. The Shenandoah played a significant role in the war, and also occupies a space as the last official Confederate military to surrender. The diary was found in Tenessee, where Scales was a well respected and influential figure. Other records from Scales from this period are held in collections in the US, including his diary from the Atlanta which is in Duke University, and of his exploits on the Arkansas. From the perspective of provenance, and of respect des fonds, would it not be better to keep these diaries together? Or, does the potential of linked data mean that it does not matter where the physical record is located so long as the metadata enables the records to be brought together in a virtual sense?

I wish the Library well in its fundraising, and hope, where ever the diary winds up, that it does indeed become a key part of the broader Civil War story, while still providing a fresh perspective on both Melbourne and on the activities of one of Tenessee’s beloved sons.

Links, distributions

It’s been a very disjointed sort of day.  I sat down yesterday to answer a query on AIPs, SIPS and DIPs by Chris Hurley over on the Archives and Records google group, and woke up this morning thinking about the need to clarify one of my points. Then over to twitter to catch up on the ACA conference I mentioned yesterday, and Peter van Garderen was the key note speaker, talking about access to archives, which included a link to his paper on decentralised collections. This seemed to me to be relevant to the ideas that Chris is discussing, so I was able to add it in to my response (told you I would cheat. Be grateful I didn’t just provide the link and publish. I also argued with people about copyright, but we won’t go there).

It’s my day off, the one for big thinking, but my mind is full of little links. One of the things that I’ve been thinking about is based on the blog by Petra Dumbell, a PhD student at Curtin, for whom I am an associate supervisor. Petra too, was thinking about collections, but her idea was to create a link to people. At first I thought she was thinking about the People and Organisation entities in the Resource Description and Access protocols being used by entities like the National Library of Australia ( and which have a lot of similarities to archival authority records), and then I realised that she was talking about something like a persistent HumanLibrary program. I’m not sure I like the idea of being booked out for a coffee chat to talk about blogging for example, but the HumanLibrary idea hung around.

We can and should do more about providing insights into the people in the archive, the ones who use the collections and the staff who provide the access, develop appraisal programs and choose material.  It’s about shaking up stereotypes and expectations. There’s my colleague, Meg Travers, who is recreating the Trautonium, for example. However, there is another and perhaps more important issue here. Professor of Digital History, Tim Hitchcock, was talking on twitter about the role of archivists in helping historians develop and write a story. Archivists and librarians are being silenced, he feels; left out of the methodology and historiography that goes into a work of history. On the other hand, we as archivists struggle with ideas of agency and of objective and subjective appraisal.

As I looked at the feed for the ACA conference, I could see a theme developing about access to archives and to the stories within them. It’s also National Reconciliation Week here in Australia, and I thought, what if we had a HumanArchive program? It would require a great deal of sensitivity, but how empowering and enlightening would it be to talk to the subject of an archive file? To ask how they felt about being documented, about what it was like to find their record or that of their family? What struggles and barriers did they overcome to get access to that record?