Data preservation

From HLWIKI Canada
Jump to: navigation, search
Data preserve.png
Are you interested in contributing to HLWIKI International? contact:

To browse other articles on a range of HSL topics, see the A-Z index.


Last Update

  • Updated.jpg 25 October 2016


See also Data management portal | FigShare | Open data | Research Portal for Academic Librarians | Semantic web | Text-mining

"...the movement to effectively manage, archive, preserve, retrieve and reuse research data is one that compliments traditional library missions ..." — CIRSS, Data Curation Education Program

Data preservation deals with the long-term preservation of digital data in all formats. Authors writing in the field discuss the importance of intellectual access to data, data manipulation (text-mining) and preservation as critical issues. According to Stuart (2010) "...if we are going to continue to be relevant in the age of Google and Google Scholar, we need to move beyond the document and facilitate access to the increasing amounts of data on the web. ..." Academic libraries are taking more responsibility to coordinate data and naming it as part of their long-term institutional mandate. Much global research in health and medicine, including clinical trials data, is born-digital and increasingly accessed by computing power.

In 2006, Harvard University created the Dataverse Network Project' which is a "...repository for research data that takes care of long term preservation and good archival practices, while researchers can share, keep control of and get recognition for their data. Dataverse also supports the sharing of research data with a persistent data citation, and enables reproducible research.". A related data initiative at Harvard is the REDCap project a free, web-based, and user-friendly electronic data capture (EDC) tools for research studies. More recently, in 2013, the Council on Library and Information Resources published Research data management: principles, practices and prospects which outlines the emerging landscape of research data management responses and interventions in the United States. Wikidata, an offshoot of Wikipedia, is an interesting new data repository and feeds information for Wikipedia. For clinicians interested in tracking "missing data", see Missing Data UK.

Why preserve data?

In the data era, saving your data may feel like you have it preserved, but with digital technologies changing so quickly, digital data is as much at risk of being lost as any kind of information. Here are some of the reasons:

  • over time, file formats may not be compatible with future software, and will be unreadable
  • if documents can be opened with new software, they may be altered, no longer coherent or reliable for research
  • storage media can degrade, get scratched or broken, especially when portable, such as CDs or USB sticks
  • files of data will not always be understood because there is no supporting documentation or metadata

When storing data, take some steps to ensure it remains useable. Document it so that future readers can understand it, and describe it using various descriptive standards and metadata. When possible, move data to new storage media (disks and drives degrade over time), and keep multiple copies on various storage media. As data ages move it to new software or use formats that can be imported. All of these functions, should be discussed in a comprehensive data plan especially where preservation is concerned.

Notable projects

  • a free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
  • minimize duplication of effort in provision of digital preservation training and education programmes
  • preservation of geoscience data and materials in the United States is currently the responsibility of a set of disparate facilities and programs. There are no national standards, procedures, and protocols for the collections and minimal coordination between responsible parties. Although some collection facilities are excellent, more commonly, data and materials reside in inadequately cataloged, overfilled, and disorganized storage areas that were not designed as data repositories. Many Federal and State geological repositories are at or near capacity and are unable to accept additional materials.

Data storage costs and data curation in libraries

  • Purdue’s pricing:
  • Princeton’s pricing:
  • The 4C project announced the beta version of the Curation Costs Exchange (CCEx) website. CCEx is an online community platform for the exchange of curation cost information. The goal is to help organizations make smarter investments in digital curation by enabling knowledge transfer and cost comparisons between organizations of all types. The value of the project will depend on the willingness to share cost data and on benefits that sharing will bring about. CCEx is a crowd-sourced database and library of curation cost information. It uses costs data to provide automatic generation of results for self-assessment, cost comparisons with peers and insights into the financial accounting and activity of other organizations. 4C Project’s vision is to create a better understanding of digital curation costs through collaboration.


Personal tools