Data curation

From HLWIKI Canada

(Redirected from Data management)
Jump to: navigation, search
Data Curation Continuum (Treloar, 2007)
Are you interested in contributing your expertise to writing some of the wiki entries?
contact: dean.giustini@ubc.ca

To browse other articles on a range of HSL topics, see the wiki index.

Contents

Introduction

See also Open access in Canada, Open data and Research for librarians - portal

"...if we are going to continue being relevant in the age of Google and Google Scholar, we need to move beyond the document and facilitate access to the increasing amounts of data that is being made available on the web. ..." (Stuart, 2010)
"... data curation is the "active and on-going management of data through its lifecycle of interest and usefulness to scholarly and educational activities across the sciences, social sciences, and the humanities ... it is an emerging field that brings new opportunities and challenges for libraries. The growing movement to effectively manage, archive, preserve, retrieve and reuse research data is one that compliments traditional library missions to preserve and access information..."


Data management is a process of ensuring the accuracy, accessibility, security and storage of data and other digital files; its archival aspect is often referred to as data curation. In fulfilling curatorial and preservation responsibilities, academic libraries can take more responsibility for the coordination of data management and be part of the long-term institutional needs of faculty members and researchers. Will this data be available for analysis by other researchers? Can it be used for other data mining purposes?

What do we mean by research data and data curation?

"...data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education; curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time..."

Research data is often defined as the information (e.g. data sets, microarray, numerical data, clinical trial information, textual records, images, sound, etc.) generated or used as quantitative evidence in primary biomedical research. This research data is distinguished by the fact that it is accepted by the research community as a means to validate research findings, observations and hypotheses. According to CARL/ABRC, the majority of research data produced by academic institutions in Canada is not being properly or systematically archived in repositories. This suggests that a more concerted effort is needed to bring together experts at Canadian academic institutions to initiate data management projects.

A study conducted by the Social Sciences and Humanities Research Council (SSHRC) found that 3 Canadian organizations out of 110 systematically archive data and of those all were archived in the United States. Research data generated in Canada is not managed in a coherent manner and much of it is under-utilized or inaccessible for knowledge-creation. While some disciplines and research areas have institutional, national and international supports for data curation, this support is neither comprehensive nor well-known.

Notable websites

Managing data is central to health care
  • Health Data Initiative and Health Indicators Warehouse
    • U.S. federal government initiatives to make data more accessible for monitoring, assessment and policy development
    • access to high quality data improves understanding of a community’s health status and determinants
    • provide a single, user-friendly, source for national, state, and community health indicators
  • International Digital curation Education and Action (IDEA) Working Group http://www.ideaworkgroup.org/
    • minimize duplication of effort in provision of digital preservation training and education programmes
    • describe, promote and contextualize current training and education offerings
    • identify and exploit collaborative training and education opportunities
    • maximize inter-disciplinary training and education opportunities
    • develop a shared digital preservation training infrastructure to enable reuse of training and education materials
    • ensure synergy and complementarity between emerging curation and preservation education programmes with professional development training courses
  • myExperiment
    • a social web site for researchers sharing research objects such as scientific workflows
  • PRoteomics IDEntifications database (PRIDE)
    • centralized, standards compliant, public repository for proteomics data; developed to provide proteomics community with a repository for protein and peptide identification with evidence supporting it; details of post-translational modifications coordinated relative to peptides in which they have been found also
  • Wolfram Alpha
    • Wolfram Alpha provides access to a world of factual data, without searching, calling itself the first computational knowledge engine. On the web, there is increased emphasis on repositories of data that are maintained by national or international agencies, organizations and individuals. Wolfram Alpha has arranged the first Wolfram Data Summit to bring together people responsible for data repositories to develop innovative concepts for the future.

Canadian context

File:Data.jpg

  • DataCite is an international collaboration to improve access to research data by enabling organizations to register datasets and digital object identifiers (DOIs). Research data is defined as any research output that has not published before such as raw data, slide presentations, lab notes, etc. NRC-CISTI is responsible for assigning unique identifiers for Canadian data sets; however, CISTI is not ready to accept data sets. CISTI is in the process of assigning DOIs to data and plans to work with data centres in Canada interested in participating in DataCite.

References

Personal tools