Are you interested in contributing your expertise to writing some of the wiki entries? contact: dean.giustini@ubc.ca
To browse other articles on a range of HSL topics, see the wiki index.
Introduction
See also Open access in Canada, Open data and Research for librarians - portal
- "...if we are going to continue being relevant in the age of Google and Google Scholar, we need to move beyond the document and facilitate access to the increasing amounts of data that is being made available on the web. ..." (Stuart, 2010)
- "... data curation is the "active and on-going management of data through its lifecycle of interest and usefulness to scholarly and educational activities across the sciences, social sciences, and the humanities ... it is an emerging field that brings new opportunities and challenges for libraries. The growing movement to effectively manage, archive, preserve, retrieve and reuse research data is one that compliments traditional library missions to preserve and access information..."
Data management is a process of ensuring the accuracy, accessibility, security and storage of data and other digital files; its archival aspect is often referred to as data curation. In fulfilling curatorial and preservation responsibilities, academic libraries can take more responsibility for the coordination of data management and be part of the long-term institutional needs of faculty members and researchers. Will this data be available for analysis by other researchers? Can it be used for other data mining purposes?
What do we mean by research data and data curation?
"...data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education; curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time..."
Research data is often defined as the information (e.g. data sets, microarray, numerical data, clinical trial information, textual records, images, sound, etc.) generated or used as quantitative evidence in primary biomedical research. This research data is distinguished by the fact that it is accepted by the research community as a means to validate research findings, observations and hypotheses. According to CARL/ABRC, the majority of research data produced by academic institutions in Canada is not being properly or systematically archived in repositories. This suggests that a more concerted effort is needed to bring together experts at Canadian academic institutions to initiate data management projects.
A study conducted by the Social Sciences and Humanities Research Council (SSHRC) found that 3 Canadian organizations out of 110 systematically archive data and of those all were archived in the United States. Research data generated in Canada is not managed in a coherent manner and much of it is under-utilized or inaccessible for knowledge-creation. While some disciplines and research areas have institutional, national and international supports for data curation, this support is neither comprehensive nor well-known.
Notable websites
Managing data is central to health care
- Health Data Initiative and Health Indicators Warehouse
- U.S. federal government initiatives to make data more accessible for monitoring, assessment and policy development
- access to high quality data improves understanding of a community’s health status and determinants
- provide a single, user-friendly, source for national, state, and community health indicators
- International Digital curation Education and Action (IDEA) Working Group http://www.ideaworkgroup.org/
- minimize duplication of effort in provision of digital preservation training and education programmes
- describe, promote and contextualize current training and education offerings
- identify and exploit collaborative training and education opportunities
- maximize inter-disciplinary training and education opportunities
- develop a shared digital preservation training infrastructure to enable reuse of training and education materials
- ensure synergy and complementarity between emerging curation and preservation education programmes with professional development training courses
- myExperiment
- a social web site for researchers sharing research objects such as scientific workflows
- PRoteomics IDEntifications database (PRIDE)
- centralized, standards compliant, public repository for proteomics data; developed to provide proteomics community with a repository for protein and peptide identification with evidence supporting it; details of post-translational modifications coordinated relative to peptides in which they have been found also
- Wolfram Alpha
- Wolfram Alpha provides access to a world of factual data, without searching, calling itself the first computational knowledge engine. On the web, there is increased emphasis on repositories of data that are maintained by national or international agencies, organizations and individuals. Wolfram Alpha has arranged the first Wolfram Data Summit to bring together people responsible for data repositories to develop innovative concepts for the future.
Canadian context
- DataCite is an international collaboration to improve access to research data by enabling organizations to register datasets and digital object identifiers (DOIs). Research data is defined as any research output that has not published before such as raw data, slide presentations, lab notes, etc. NRC-CISTI is responsible for assigning unique identifiers for Canadian data sets; however, CISTI is not ready to accept data sets. CISTI is in the process of assigning DOIs to data and plans to work with data centres in Canada interested in participating in DataCite.
References
- Baker K, Yarmey L. (2009). Data stewardship: environmental data curation and a web-of-repositories. Int J Digital Curation, 4(2).
- Ball A. (2010). Review of the state of the art of the digital curation of research data. ERIM Project Document erim1rep091103ab11. Bath: University of Bath.
- Banks, M. (2009, July 21). Open access, grey literature, grey data.
- Beagrie, N. (2007). Digital preservation: setting the course for a decade of change.
- Canadian Association of Research Libraries (CARL). CARL Data Management Sub-Committee. (2009). Research Data: Unseen Opportunities An Awareness Toolkit commissioned by the Canadian Association of Research Libraries (CARL).
- Cech, T. (2003). Sharing publication-related data and materials: responsibilities of authorship in the life sciences. Washington, D.C.: National Academies Press.
- Cragin, M.H., Palmer, C.L., Heidorn, P.B., and Smith, L.C. (2007). An educational program on data curation. Poster at American Library Assocation, Science and Technology Section.
- Cragin, M.H., Smith, L.C., Palmer, C.L., and Heidorn, P.B. (2009). Extending the data curation curriculum to practicing LIS professionals. In Tibbo, H.R., Hank, C., Lee, C.A., and Clemens, R. (eds.). Proceedings of DigCCurr2009: Digital Curation: Practice, Promise, and Prospects (pp. 92-93).
- De Roure, D., Goble, C., Aleksejevs, S. et al. (2009). The myExperiment Open Repository for Scientific Workflows. Open Repositories. May 2009.
- Delserone, L,M. (2008). At the watershed: preparing for research data management and stewardship at the University of Minnesota Libraries. Library Trends, 57(2), 202-210.
- Giustini, D. (2009, September 14). The Search Principle blog. Let's liberate ‘grey data’ & grey literature.
- Humphrey, C. (2004). Preserving research data: a time for action. In Canadian Conservation Institute. Preservation of electronic records: new knowledge and decisionmaking. Ottawa, ON: Author.
- Interagency Working Group on Digital Data. Committee on Science of the National Science and Technology Council (2009). Harnessing the Power of Digital Data for Science and Society. Washington, DC.
- Lord, P., MacDonald, A. (2003). Data curation for e-science in the UK: an audit to establish requirements for future curation and provision. Prepared for the JISC Committee for Support Research (JCSR).
- National Science Foundation. (2009). Community-based Data Interoperability Networks.
- Ohno-Machado L. A hybrid open-access model to bridge the publishing divide and reach out to a broader community. J Am Med Inform Assoc. 2011 May 1;18(3):210-1.
- Piwowar, H.A., Day, R.S., Fridsma, D.B. (2007). Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE, 2(3), e308.
- Rusbridge C. (2008). Tomorrow, and tomorrow, and tomorrow: poor players on the digital curation stage. In Earnshaw, R. and Vince, J. (eds.). Digital Convergence - Libraries of the Future. London: Springer.
- Scaramozzino, J.M., Ramirez, M., McGaughey, K. (2010). Managing the data deluge: understanding scientists' need for data curation services.
- Research Data Strategy Working Group. (2008). Stewardship of Research Data in Canada: A Gap Analysis.
- Stuart D. (2010). Programming skills could transform librarians' roles. Research Information, December/January, 2010.
- Walters, T.O. (2009). Data Curation Program Development in U.S. Universities: The Georgia Institute of Technology Example. International Journal of Digital Curation, 4(3), 83-92.
|