Next week, the largest hackathon in the world, GovHack, takes place in Australia and New Zealand. There are govhack sites at universities and regional centres, and in all the major centres. Each site has participants, who make the things, and mentors who provide advice and guidance on tools and datasets. There’s even a specific cultural hack node in Canberra, run by Tim Sherratt.
This year, I’ve signed up as a data mentor for WA, which has a central city site and a regional node at Geraldton. This will be my third year as a data mentor, and my first year as a general mentor, talking cultural data generally (mostly archival, of course) rather than representing the SROWA. It’s a lot more organised than I anticipated, and people are already asking for more information to help them prepare. To this end, I’m going to use this post to talk about some datasets. Participants need to use at least one official data set, but can then also look for other data that they can mash together or reuse. This way, I can print off the page as a guide, and provide a link to it for #govhackwa participants. I’ll do an additional post or two on some tools for analysing them and provide some examples of how data has been used.
Official cultural datasets
These datasets are taken from the various government data portals.
Searching the data.gov.au dataset reveals 144 datasets for the keyword ‘library’, 117 for ‘archive’, 52 for ‘museum’ and 129 for ‘cultural’. The latter includes some gis datasets, including the “coarse cultural topographical data”, showing where major population centres are and the CPI index. In addition to collection links and collection subsets, State and National Libraries have contributed statistical datasets relating to location of libraries, user statistics and so on.
My top picks, outside of TROVE (from the National Library) and ANDS (Australian National Data Service) are –
The National Archives of Australia – “Memory of a nation” – digitised content from online exhibitions – https://data.gov.au/dataset/memory-of-a-nation-data and the Commonwealth Agencies dataset, which provides a comprehensive set of federal government departments, ministeries, offices and so on. Because of the way archives link data, some state and local government agencies are also included. This dataset was last updated in April, 2016 – https://data.gov.au/dataset/commonwealth-agencies.
The State Records Office of New South Wales has a number of indexes available as csv files in the NSW data portal – including convicts, soldier settlement indexes and wills and probate, not to mention their Flickr dataset. SRNSW collection information can also be searched via their online catalogue. Queensland State Archives has 55 datasets in the data.qld.gov.au portal – https://data.qld.gov.au/dataset?q=archive&tags=Queensland+State+Archives&groups=historical. State Records South Australia has 5 datasets – http://data.sa.gov.au/data/organization/state-records.
Other museum datasets include the gorgeous Scott Sisters collection from the Australian Museum, itself the subject of a remix competition in 2013/2014 – http://data.nsw.gov.au/data/dataset/4e57d134-79e9-42ad-a0a9-83fc91e1091c
There’s a plethora of WW1 related datasets – searching for ‘World War’ returns 24 datasets, of which only two are not clearly related, and the majority of which are from State Libraries.
It’s worth remembering that data in TROVE is harvested from all public libraries, and includes data from museums and archives. The content can be filtered via the TROVE API. The Public Records Office of Victoria and the Australian National University and Noel Butlin Archives have both contributed data to TROVE. The State Library of Queensland not only has data in TROVE, but also contributed over 50,000 photographs to Wikipedia.
TROVE has some useful examples and help sheets – http://help.nla.gov.au/trove/building-with-trove/examples
The Australian National Data Service is similarly rich and complex. Again, the Public Records Office of Victoria (PROV) has contributed data to ANDS, along with the State Records Office of NSW. The PROV’s semantic wiki is available as an xml formatted download – https://www.data.vic.gov.au/data/dataset/public-record-office-victoria-semantic-wiki.
Postcript – I’ve just been advised that the Curtin Library has made weather observation data from Jon Sanders’ 1986 – 1988 circumnavigation available through ANDS – https://researchdata.ands.org.au/search/#!/rows=15/sort=list_title_sort%20asc/class=collection/q=jon%20sanders/p=1/group=Curtin%20University/. There’s also a nice blog – http://triplesolo.library.curtin.edu.au/ – and you can follow along on Twitter #triplesolo #noonsummary.
Weather afficionados may also be interested in the digitised daily observations from colonial Perth, now in the NAA collection – http://recordsearch.naa.gov.au/SearchNRetrieve/Interface/ListingReports/ItemsListing.aspx?series=PP430/1
Finally, in the WA datasets, you will find a range of historical maps and plans, taken from the State Records Office digitised collection – each map links to the series at the top, but there are some older links to the previous catalogue. For better searching and exporting of data, it’s best to go straight to the new catalogue – https://archive.sro.wa.gov.au/
WA theme parks – taken from the Landgate “locate the 80s’ site – http://catalogue.beta.data.wa.gov.au/dataset/wa-theme-parks