|Are you interested in contributing to HLWIKI International? contact
To browse other articles on a range of HSL topics, see the A-Z index.
- 25 October 2016
See also Data management portal | FigShare | Open data | Research Portal for Academic Librarians | Semantic web | Text-mining
- "...the movement to effectively manage, archive, preserve, retrieve and reuse research data is one that compliments traditional library missions ..." — CIRSS, Data Curation Education Program
Data preservation deals with the long-term preservation of digital data in all formats. Authors writing in the field discuss the importance of intellectual access to data, data manipulation (text-mining) and preservation as critical issues. According to Stuart (2010) "...if we are going to continue to be relevant in the age of Google and Google Scholar, we need to move beyond the document and facilitate access to the increasing amounts of data on the web. ..." Academic libraries are taking more responsibility to coordinate data and naming it as part of their long-term institutional mandate. Much global research in health and medicine, including clinical trials data, is born-digital and increasingly accessed by computing power.
In 2006, Harvard University created the Dataverse Network Project' which is a "...repository for research data that takes care of long term preservation and good archival practices, while researchers can share, keep control of and get recognition for their data. Dataverse also supports the sharing of research data with a persistent data citation, and enables reproducible research.". A related data initiative at Harvard is the REDCap project a free, web-based, and user-friendly electronic data capture (EDC) tools for research studies. More recently, in 2013, the Council on Library and Information Resources published Research data management: principles, practices and prospects which outlines the emerging landscape of research data management responses and interventions in the United States. Wikidata, an offshoot of Wikipedia, is an interesting new data repository and feeds information for Wikipedia. For clinicians interested in tracking "missing data", see Missing Data UK.
Why preserve data?
In the data era, saving your data may feel like you have it preserved, but with digital technologies changing so quickly, digital data is as much at risk of being lost as any kind of information. Here are some of the reasons:
- over time, file formats may not be compatible with future software, and will be unreadable
- if documents can be opened with new software, they may be altered, no longer coherent or reliable for research
- storage media can degrade, get scratched or broken, especially when portable, such as CDs or USB sticks
- files of data will not always be understood because there is no supporting documentation or metadata
When storing data, take some steps to ensure it remains useable. Document it so that future readers can understand it, and describe it using various descriptive standards and metadata. When possible, move data to new storage media (disks and drives degrade over time), and keep multiple copies on various storage media. As data ages move it to new software or use formats that can be imported. All of these functions, should be discussed in a comprehensive data plan especially where preservation is concerned.
- a free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.
- minimize duplication of effort in provision of digital preservation training and education programmes
- preservation of geoscience data and materials in the United States is currently the responsibility of a set of disparate facilities and programs. There are no national standards, procedures, and protocols for the collections and minimal coordination between responsible parties. Although some collection facilities are excellent, more commonly, data and materials reside in inadequately cataloged, overfilled, and disorganized storage areas that were not designed as data repositories. Many Federal and State geological repositories are at or near capacity and are unable to accept additional materials.
Data storage costs and data curation in libraries
- Purdue’s pricing: https://purr.purdue.edu/about/pricing
- Princeton’s pricing: http://dataspace.princeton.edu/jspui/about/DataSpacePnG.pdf
- The 4C project announced the beta version of the Curation Costs Exchange (CCEx) website. CCEx is an online community platform for the exchange of curation cost information. The goal is to help organizations make smarter investments in digital curation by enabling knowledge transfer and cost comparisons between organizations of all types. The value of the project will depend on the willingness to share cost data and on benefits that sharing will bring about. CCEx is a crowd-sourced database and library of curation cost information. It uses costs data to provide automatic generation of results for self-assessment, cost comparisons with peers and insights into the financial accounting and activity of other organizations. 4C Project’s vision is to create a better understanding of digital curation costs through collaboration.
- ACRL Academic Libraries and Research Data Services: current practices and plans for the future. An ACRL White Paper, 2012. Carol Tenopir, Ben Birch, Suzie Allard.
- Bailey CW. Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012.
- Baker K, Yarmey L. Data stewardship: environmental data curation and a web-of-repositories. Int J Digital Curation. 2009;4(2).
- Ball A. Review of the state of the art of the digital curation of research data. ERIM Project. University of Bath; 2010.
- Beagrie N. Digital preservation: setting the course for a decade of change. 2007.
- Canadian Association of Research Libraries. Research data: unseen opportunities an awareness toolkit commissioned by CARL; 2009.
- Chalmers I, Altman DG, McHaffie H, Owens N, Cooke RWI. Data sharing among data monitoring committees and responsibilities to patients and science. Trials. 2013;14:102.
- Charbonneau DH. Strategies for data management engagement. Med Ref Serv Q. 2013;32(3):365-374.
- Cox A, Verbaan E, Sen B. Upskilling liaison librarians for research data management. Ariadne. 2012;70.
- Cragin MH, Palmer CL, Heidorn PB. Extending the data curation curriculum to practicing LIS professionals. DigCCurr2009: Digital Curation: Practice, Promise & Prospects; 2009.
- Creamer AT, Martin ER, Kafel D. Research data management and the health sciences librarian. University of Massachusetts Medical School. Library Publications and Presentations, 2014. Paper #147.
- Giarlo MJ. Academic libraries as data quality hubs. J Libr Scholarly Commun. 2013;1(3):eP1059.
- Gore SA. e-Science and data management resources on the web. Med Ref Serv Q. 2011;30(2):167–77.
- Heidorn PB. The emerging role of libraries in data curation and e-science. J Libr Admin. 2011;51(7-8):662–672.
- Humphrey C. Preserving research data: a time for action. In: Canadian Conservation Institute. Preservation of electronic records: new knowledge and decisionmaking. Ottawa, ON: 2004.
- Interagency Working Group on Digital Data. Science of the National Science and Technology Council. Harnessing the Power of Digital Data for Science and Society. Washington, DC; 2009.
- Lewis SC, Rodrigo Z, Hermida A. Content analysis in an era of big data: a hybrid approach to computational and manual methods. J Broadcast Elec Media. 2013;57(1):34-52.
- LIBER Working Group. Ten recommendations for libraries to get started with research data management. Final Report on E-Science, 2012.
- Mallon M. Data curation. Public Services Q. 2012;8(4) :326-337.
- Martin R. What do data services librarians do? J eSci Libr. 2012;1(3):Article 3.
- Miller HE. Big-data in cloud computing: a taxonomy of risks. Info Res. 2013;18(1):paper 571.
- Ohno-Machado L. A hybrid open-access model to bridge the publishing divide and reach out to a broader community. JAMIA. 2011;18(3):210–1.
- Rani M, Buckley BS. Systematic archiving and access to health research data: rationale, current status and way forward. Bull World Health Organ. 2012;90:932–939.
- Rusbridge C. Tomorrow, and tomorrow, and tomorrow: poor players on the digital curation stage. In: Digital convergence – libraries of the future. London: Springer; 2008.
- Scaramozzino JM, Ramirez M, McGaughey K. Managing the data deluge: understanding scientists' need for data curation services; 2010.
- Ross JS, Krumholz HM. Ushering in a New Era of Open Science Through Data Sharing. JAMA. 2013;():1-2.
- Simons N. Implementing DOIs for research data. D-Lib Magazine. May/June 2012;18(5/6).
- Stahl-Timmins W. Information graphics in health technology assessment. PhD thesis, University of Exeter, UK. 2011.
- Stuart D. Programming skills could transform librarians' roles. Research Information; 2010.
- Tenopir C, Sandusky RJ, Allard S, Birch B. Academic librarians and research data services: preparation and attitudes IFLA Journal. March 2013;39:70-78.
- Walters TO. Data curation program development in US universities: the Georgia Institute of Technology example. Int J Digital Curation. 2009;4(3):83–92.