Data curation

From HLWIKI Canada

Jump to: navigation, search
Data Curation Continuum (Treloar, 2007)
Are you interested in contributing to HLWIKI Canada? contact: dean.giustini@ubc.ca

To browse other articles on a range of HSL topics, see the wiki index.

Contents

Introduction

See also Data visualization, Open data and Research for librarians - portal

"...if we are going to continue being relevant in the age of Google and Google Scholar, we need to move beyond the document and facilitate access to the increasing amounts of data that is being made available on the web. ..." (Stuart, 2010)
"... data curation is the "active and on-going management of data through its lifecycle of interest and usefulness to scholarly and educational activities across the sciences, social sciences, and the humanities ... it is an emerging field that brings new opportunities and challenges for libraries. The growing movement to effectively manage, archive, preserve, retrieve and reuse research data is one that compliments traditional library missions to preserve and access information..."


Data management is a process of ensuring the accuracy, accessibility, security and storage of data and other digital files; its archival aspect is often referred to as data curation. In fulfilling curatorial and preservation responsibilities, academic libraries can take more responsibility for the coordination of data management and be part of the long-term institutional needs of faculty members and researchers. Will this data be available for analysis by other researchers? Can it be used for other data mining purposes?

What do we mean by research data and data curation?

"...data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education; curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time..."

Research data is often defined as the information (e.g. data sets, microarray, numerical data, clinical trial information, textual records, images, sound, etc.) generated or used as quantitative evidence in primary biomedical research. This research data is distinguished by the fact that it is accepted by the research community as a means to validate research findings, observations and hypotheses. According to CARL/ABRC, the majority of research data produced by academic institutions in Canada is not being properly or systematically archived in repositories. This suggests that a more concerted effort is needed to bring together experts at Canadian academic institutions to initiate data management projects.

A study conducted by the Social Sciences and Humanities Research Council (SSHRC) found that 3 Canadian organizations out of 110 systematically archive data - of those, all were archived in the US. Research data generated in Canada is not managed properly and much of it is under-utilized or inaccessible. While some disciplines and research areas have institutional, national and international supports for data curation, this support is neither comprehensive nor well-known.

Notable websites

Managing data is central to health care
  • DMPTool adheres to National Institutes of Health (NIH) data sharing requirements
  • DMPTool provides step-by-step guidance to help users create ready-to-use data management plans and meet funder data management requirements. While anyone can create an account and use this resource, many institutions have partnered with the DMPTool to allow login through their home institution, and, in some cases have provided customized help and support
  • U.S. federal government initiatives to make data more accessible for monitoring, assessment and policy development
  • access to high quality data improves understanding of a community’s health status and determinants
  • provide a single, user-friendly, source for national, state, and community health indicators
  • minimize duplication of effort in provision of digital preservation training and education programmes
  • describe, promote and contextualize current training and education offerings
  • identify and exploit collaborative training and education opportunities
  • maximize inter-disciplinary training and education opportunities
  • develop a shared digital preservation training infrastructure to enable reuse of training and education materials
  • ensure synergy and complementarity between emerging curation and preservation education programmes with professional development training courses
  • a social web site for researchers sharing research objects such as scientific workflows
  • aimed at helping researchers share biomedical data and models; PhysiomeSpace has just completed its beta implementation and is open to users
  • centralized, standards compliant, public repository for proteomics data; developed to provide proteomics community with a repository for protein and peptide identification with evidence supporting it; details of post-translational modifications coordinated relative to peptides in which they have been found also
  • Need to create a data plan for a grant proposal? Find out what to include & see examples.
  • Wolfram Alpha provides access to a world of factual data, without searching, calling itself the first computational knowledge engine. On the web, there is increased emphasis on repositories of data maintained by national or international agencies, organizations and individuals. Wolfram Alpha now hosts the Wolfram Data Summit to bring together those responsible for data repositories and to develop innovative concepts for the future.

Novel data literacy at Purdue

In partnership with librarians at the University of Minnesota, University of Oregon and Cornell University, the Purdue University Libraries received $250,000 from IMLS to develop programs for the next generation of scientists to enable them to find, organize and share data. The program is intended for graduate students in science working their way toward careers as research scientists. In 2012, technology makes it easier to share research data beyond the lab. In many cases, data is not administered in ways that enable it to be easily discovered, understood, or re-purposed by others. This training is vital to scientists as they look to secure research funding. The National Science Foundation issued a report in 2007 on the need to build public collections of research data; since 2011, it has required scientists to include data management plans in their grant applications.

The Data Information Literacy effort will be carried out over two-years by five teams. Two teams, consisting of a data librarian, subject librarian and faculty researcher, are based at Purdue, with one team each at the other institutions. Teams are constructed to represent various subjects from computer engineering to landscape architecture so commonalities and differences in data curation can be explored. Each team will conduct an assessment of data needs for their discipline, including interviewing and observing researchers. Teams will develop and implement targeted instruction and assess the impact of that instruction in developing the data information literacy skills of graduate students.

More information on the data information literacy project is available at http://wiki.lib.purdue.edu/display/ste

Canadian context

  • DataCite is an international collaboration to improve access to research data by enabling organizations to register datasets and digital object identifiers (DOIs). Research data is defined as any research output that has not been published before such as raw data, slide presentations, lab notes, etc. CISTI is responsible for assigning unique identifiers for Canadian data sets; however, CISTI is not ready to accept data sets; it does plan to assign DOIs to data and work with data centres in Canada interested in participating in DataCite.

References

Personal tools