In January 2022, the Natural History Museum reached an important milestone: 5 million digital records were now available via their data portal. From these records, 32 billion records have been downloaded in over half a million download events since 2015. This has clearly shown the value in having digital records freely available to the public.
At the MuseumNext Digital Summit both programme manager of RECODE, Steen Dupont and London Technical Solutions Architect, Paul Ward from the Natural History Museum shared their journey to reaching 5 million digital records and the challenges it presented to the wider museum IT infrastructure.
Infrastructure is hugely important when considering the digitisation of collections. In this case, it refers to networks, servers, storage, backup, disaster recovery, and the collections management system itself. Without infrastructure, collections are likely to be located on media files on local hard drives and data in individual spreadsheets and databases.
As an example of how this works, between a digital copy being made of a specimen, it is likely copied up to nine times across the museum’s IT systems and back-up options to tape – known as storage tiering. The items sent to tape are essentially being sent to storage due to inactivity and it can take up to 90 seconds to retrieve any items sent to tape. Fine for individual requests, but not ideal for large-scale data downloads.
With the huge amounts of data required by the museum, data recovery in the case of malicious attacks or disasters could take up to six months to retrieve. It was in this framework that a new approach to the museum infrastructure was sought with new Infrastructure initiatives moved forward.
FLOWS is the first of the infrastructure initiatives created by the museum in response to the perceived challenges it faced. FLOWS tracks files, workstation, has server processes, an automatic copy clear-down and issues reporting. Allowing the team to automatically remove ingest artefacts at the beginning of the copying process.
To stop the bit rot of data stored to tape and improve the download time for retrieval, a whole now architecture was required. Valentine is the second infrastructure initiative implemented by the museum to achieve this. It stores data in a cloud object storage system instead of tape. When a file is saved to Valentine, it is saved to identical storage systems in two different data centres. Within 15 minutes, the file is erasure-coded with redundant data and split into three parts. Each goes to a separate data centre.
Again, if a file is not accessed for a period of time, it is removed from the initial tier, leaving stub files. Now, the access time for a migrated file is under two seconds.
RECODE refers to a programme for rethinking collections data and it was really born out of the need to convert the next 75 million digital records at the museum in a way that follows clear procedures and processes.
RECODE is about finding a new collections management system that will bring the museum into the future using a stakeholder-first approach. After all, it is stakeholders who know what the system needs to do and it is up to the programme team to have those long and in-depth conversations to understand that. But it’s also about how the museum processes a specimen and how to go from acquisition, or acquiring a collection, to actually loaning a part of that out.
RECODE is hugely important as it is creating a system that is flexible, configurable and manageable by the institution. The institution has the knowledge about the sector, and the programme team want to drive whatever the technology is to fit to that.
Infrastructure is key and works best when no one realises it is there at all to make it easy for end-users to consume. These advancements in infrastructure represent input from the whole-museum and in that sense, are a true reflection of the needs of the museum community.
Steen Dupont and Paul Ward from the Natural History Museum spoke at the MuseumNext Digital Summit in June 2022. To find out more about this event and how you can watch on-demand click here.