Search Museum Next

Building the Digital Infrastructure behind the NHM Collection

In January 2022, the Natural History Museum reached an important milestone: 5 million digital records were now available via their data portal.  From these records, 32 billion records have been downloaded in over half a million download events since 2015. This has clearly shown the value in having digital records freely available to the public.

At the MuseumNext Digital Summit, both programme manager of RECODE, Steen Dupont and London Technical Solutions Architect, Paul Ward from the Natural History Museum shared their journey to reaching 5 million digital records and the challenges it presented to the wider museum IT infrastructure. 

Infrastructure challenges of digitisation

Infrastructure is hugely important when considering the digitisation of collections. In this case, it refers to networks, servers, storage, backup, disaster recovery, and the collections management system itself. Without infrastructure, collections are likely to be located on media files on local hard drives and data in individual spreadsheets and databases.

As an example of how this works, between a digital copy being made of a specimen, it is likely copied up to nine times across the museum’s IT systems and back-up options to tape – known as storage tiering. The items sent to tape are essentially being sent to storage due to inactivity and it can take up to 90 seconds to retrieve any items sent to tape.  Fine for individual requests, but not ideal for large-scale data downloads.

With the huge amounts of data required by the museum, data recovery in the case of malicious attacks or disasters could take up to six months to retrieve. It was in this framework that a new approach to the museum infrastructure was sought with new Infrastructure initiatives moved forward.

FLOWS

FLOWS is the first of the infrastructure initiatives created by the museum in response to the perceived challenges it faced. FLOWS tracks files, workstation, has server processes, an automatic copy clear-down and issues reporting. Allowing the team to automatically remove ingest artefacts at the beginning of the copying process.

Valentine

To stop the bit rot of data stored to tape and improve the download time for retrieval, a whole now architecture was required. Valentine is the second infrastructure initiative implemented by the museum to achieve this. It stores data in a cloud object storage system instead of tape. When a file is saved to Valentine, it is saved to identical storage systems in two different data centres. Within 15 minutes, the file is erasure-coded with redundant data and split into three parts. Each goes to a separate data centre.

Again, if a file is not accessed for a period of time, it is removed from the initial tier, leaving stub files. Now, the access time for a migrated file is under two seconds.

RECODE

RECODE refers to a programme for rethinking collections data and it was really born out of the need to convert the next 75 million digital records at the museum in a way that follows clear procedures and processes.

RECODE is about finding a new collections management system that will bring the museum into the future using a stakeholder-first approach.  After all, it is stakeholders who know what the system needs to do and it is up to the programme team to have those long and in-depth conversations to understand that.  But it’s also about how the museum processes a specimen and how to go from acquisition, or acquiring a collection, to actually loaning a part of that out.

RECODE is hugely important as it is creating a system that is flexible, configurable and manageable by the institution. The institution has the knowledge about the sector, and the programme team want to drive whatever the technology is to fit to that.

Infrastructure is key and works best when no one realises it is there at all to make it easy for end-users to consume. These advancements in infrastructure represent input from the whole-museum and in that sense, are a true reflection of the needs of the museum community.

Steen Dupont, Programme Manager of RECODE and Paul Ward, London Technical Solutions Architect from the Natural History Museum spoke at the MuseumNext Digital Summit in June 2022.

 

MuseumNext offer online learning for museum professionals striving for engaging, relevant and flexible professional growth content. Learn more about our virtual museum conferences here.

Related Content

Building the digital infrastructure behind the Natural History Museum collection

In January 2022, the Natural History Museum reached an important milestone: 5 million digital records were now available via their data portal.  From these records,...

Terentia: creating digital infrastructure and ecosystems to manage museum collections, assets and create experiences

Toronto-based Terentia officially introduced itself to the world in February to assist museums and the wider cultural sector in their digital asset management, collections management...

Livdeo: building more comprehensive and accessible digital platforms for museum visitors

According to Livdeo, a French tech company that provides inclusive digital solutions for cultural institutions, the digital collections should be accessible to everyone, regardless of...

Subscribe to the latest museum thinking

Fresh ideas from museums around the globe in your inbox each week