A digital preservation workflow for academic research

As part of the Data Lifeboat meeting I attended in November 2024, I'm jotting down some rough, high-level thoughts on what a good digital preservation workflow might be.

I am writing this as a stream of consciousness from my experience as an academic researcher. There are certainly things I missed or that I will think of later.

The workflow is organised below into three stages: Pre research; during research; and post research. Within each one I'll write down what would be good to happen at that stage.

Pre research

Start with a research “data” management plan. I'm using the term data very broadly here to mean the artefacts that result from a research projects, which could be (but not limited to) general notes, numerical data, interview transcripts, audio/video recordings, artwork, lab notebooks, etc.

When writing the plan, think about:

What artefacts do you anticipate from the research? Which will be shared and preserved? Remember things may be be produced throughout, not just at the end.
How will artefacts be shared and preserved? Any anticipated barriers? How might they be overcome? How could this be done in a way that ensures, as much as possible, that they can be human and machine readable years later?
Where will they be preserved? Make sure the appropriate digital repositories are in place.
When do you expect each output to be produced? Will the “how” be ready at those times? Closely related is for how long (what timescales) do you hope for they to be preserved? 10 years? 20 years? 100 years?!
Who will take on the responsibility of carrying out this plan?

From experience, I know that a big challenge is not just coming up with such a plan, but to budget the time, resources, and labour to implement it. In academic research, I think this is an underappreciated point. At least from my scientific background, there are many scientists who scramble to prepare and publish data (usually because an academic journal requires them to publish data) at the last minute, and end up doing a poor job at digital preservation.

During research

During the course of a research project, remember to do good documentation. In my view, it is especially important to write down things like spontaneous learnings (“what are we learning along the way?”) or to note deviations from the research plan.

Documentation could also be informal, like rehearsal notes for performing arts or daily lab notebooks for an experimental scientist. Blog posts are also good.

Regularly check in with the original data management plan to see if it is being followed or if changes are needed.

Post research

In my view, a post-mortem is a critical exercise in any research project. This is true, too, for reflecting on how well a project's digital preservation plan/data management plan worked. Some questions to ask:

Did we produce the digital artefacts we anticipated at the beginning?
What was the experience of sharing and preserving those artefacts? Any points of friction?
What would we do differently next time?
How will we preserve and shared what we learned from this post-mortem to inform future efforts?

Another meta issue I see in academic research is the lack of appreciation, and highlighting of, the reuse of digitally preserved material. At least from what I've seen, there's lots of talk in #openresearch circles about sharing and how to do it well, but far less on using what others have shared!

I think if we do a good job of telling stories about the use of shared stuff, then we can more effectively make a case for digitally preserving said stuff and reducing #intellectualpoverty.

Unless otherwise stated, all original content in this post is shared under the Creative Commons Attribution-ShareAlike 4.0 International license