Data management

From HLWIKI Canada
(Redirected from Data curation)
Jump to: navigation, search
Fourth paradigm.png
Are you interested in contributing to HLWIKI Internationalhlwiki.ca? contact: dean.giustini@ubc.ca

To browse other articles on a range of HSL topics, see the A-Z index.

Contents

Last Update

  • Updated.jpg 17 June 2013

Introduction

See also Bioinformatics | Data science portal | Data visualization | e-Science | Open data | Research Portal for Academic Librarians | Semantic web | Text-mining

"... data curation is the "active and on-going management of data through its lifecycle of interest and usefulness to scholarly and educational activities across the sciences, social sciences, and humanities ... it is an emerging field that brings new opportunities and challenges for libraries. The growing movement to effectively manage, archive, preserve, retrieve and reuse research data is one that compliments traditional library missions ..." — CIRSS, Data Curation Education Program 2012

Data management (also data curation and data science) refers to maintaining the accessibility, storage and preservation of data. Authors writing in the field suggest data curation involves the selection and appraisal of information and may deal with issues such as evolving provision of intellectual access, redundant storage, data transformation and, for some materials, a commitment to preservation. According to Stuart (2010) "...if we are going to continue to be relevant in the age of Google and Google Scholar, we need to move beyond the document and facilitate access to the increasing amounts of data on the web. ..." Academic libraries are taking more responsibility to coordinate data and naming it as part of their long-term institutional mandate. Much global research in health and medicine, including clinical trials data, is born-digital and increasingly accessed by computing power. Will research data available for analysis by other researchers, how might research data be used for other data mining purposes?

The Fourth Paradigm is a term connected to e-data. This is the vision of pioneering computer scientist Jim Gray for a new fourth paradigm of discovery based on data-intensive science; the extensive monograph offers insights into how it can be fully realized. To take a look at a guide primarily geared toward researchers and data librarians, see here.

In 2013, it was announced that Wikidata, an offshoot of Wikipedia, and centralized repository for data and facts, now feeds information for Wikipedia.

What do we mean by research data and data curation?

Data has enormous value if managed well, and made accessible. Research data may be defined as the information (e.g. data sets, microarray, numerical data, clinical trial information, textual records, images, sound, etc.) generated or used as quantitative evidence in primary biomedical research. This research data is distinguished by the fact that it is accepted by the research community as a means to validate research findings, observations and hypotheses. According to CARL/ABRC, the majority of research data produced by academic institutions in Canada is not being properly or systematically archived in repositories. This suggests that a more concerted effort is needed to bring together experts at Canadian academic institutions to initiate data management projects. A study conducted by the Social Sciences and Humanities Research Council (SSHRC) found that 3 Canadian organizations out of 110 systematically archive data - of those, all were archived in the US. Research data generated in Canada is not managed properly and much of it is under-utilized or inaccessible. While some disciplines and research areas have institutional, national and international supports for data curation, this support is neither comprehensive nor well-known.

Canadian projects & websites

Data.jpg
  • DataCite Canada, NRC 800px-Flag of Canada.svg.png is an international collaboration to improve access to research data by enabling organizations to register datasets and digital object identifiers (DOIs). Research data is defined as any research output that has not been published before such as raw data, slide presentations, lab notes, etc. CISTI is responsible for assigning unique identifiers for Canadian data sets; however, CISTI is not ready to accept data sets; it does plan to assign DOIs to data and work with data centres in Canada interested in participating in DataCite.

Data management courses

Data literacy

  • "Data literacy must also include the ability to do something with raw information - to process it in some way. In an era where spreadsheets help us to make the grandest of decisions, we must have basic statistical literacy and fluency in the tools that allow us to make sense out of numerical data, not just words and ideas." ~ Johnson, "The Information Diet: A Case for Conscious Consumption"
  • Khan Academy. Statistics. 4-star.gif Us flag.jpg basics of reading and interpreting data; descriptive and inferential statistics covered in an introductory course

International projects & websites

4-star.gif 4 stars denotes librarian-selected, high quality information. Starred sites are great places to begin your research.
Managing data is central to health care
  • CSAIL looks at the issue of big data as "fundamentally multi-disciplinary"; the MIT team includes faculty and researchers across related technology areas, including algorithms, architecture, data management, machine learning, privacy and security, user interfaces, and visualization; as well as domain experts in finance, medical, smart infrastructure, education and science
  • Databib is a tool for helping people identify and locate online repositories of research data
DataCite170.png
  • helping you to find, access and use data
  • DataCite Canada's services are offered in cooperation with DataCite, an international consortium of national-scale libraries and research organizations committed to increasing access to research data on the Internet
  • DataCite Canada is DataCite's DOI allocation agent for Canada
  • DataCite promotes the value of data archiving, citation and discoverability within Canada
  • table lists NIH-supported data repositories that accept submissions of appropriate data from NIH-funded investigators (and others). Also included are resources that aggregate information about biomedical data and information sharing systems
  • DMPTool adheres to National Institutes of Health (NIH) data sharing requirements
  • DMPTool provides step-by-step guidance to help users create ready-to-use data management plans and meet funder data management requirements. While anyone can create an account and use this resource, many institutions have partnered with the DMPTool to allow login through their home institution, and, in some cases have provided customized help and support
  • Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Dryad enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies
  • a collaborative project devoted to educating science and medical librarians on e-Science, the portal was initiated at the University of Massachusetts Medical School through funding from the National Network of Libraries of Medicine
  • a vision of pioneering computer scientist Jim Gray for a new fourth paradigm of discovery based on data-intensive science; this extensive monograph offers insights into how it can be fully realized
  • U.S. federal government initiatives to make data more accessible for monitoring, assessment and policy development
  • access to high quality data improves understanding of a community’s health status and determinants
  • provide a single, user-friendly, source for national, state, and community health indicators
  • minimize duplication of effort in provision of digital preservation training and education programmes
  • describe, promote and contextualize current training and education offerings
  • identify and exploit collaborative training and education opportunities
  • maximize inter-disciplinary training and education opportunities
  • develop a shared digital preservation training infrastructure to enable reuse of training and education materials
  • ensure synergy and complementarity between emerging curation and preservation education programmes with professional development training courses
  • a research and teaching unit at Harvard University dedicated to exploring and expanding the frontiers of networked culture in the arts and humanities
  • a social web site for researchers sharing research objects such as scientific workflows
  • aims to solve name ambiguity problem in scholarly communications by creating a registry of persistent unique identifiers for individual researchers and an open and transparent linking mechanism between ORCID, other ID schemes, and research objects such as publications, grants, and patents
  • aimed at helping researchers share biomedical data and models; PhysiomeSpace has just completed its beta implementation and is open to users
Data Curation Continuum (Treloar, 2007)
  • centralized, standards compliant, public repository for proteomics data; developed to provide proteomics community with a repository for protein and peptide identification with evidence supporting it; details of post-translational modifications coordinated relative to peptides in which they have been found also
  • Need to create a data plan for a grant proposal? Find out what to include & see examples.
  • Wolfram Alpha provides access to a world of factual data, without searching, calling itself the first computational knowledge engine. On the web, there is increased emphasis on repositories of data maintained by national or international agencies, organizations and individuals. Wolfram Alpha now hosts the Wolfram Data Summit to bring together those responsible for data repositories and to develop innovative concepts for the future.
  • provide all users with improved access to World Bank data and to make that data easy to find and use

Data Information Literacy at Purdue Us flag.jpg

In partnership with librarians at the University of Minnesota, University of Oregon and Cornell University, the Purdue University Libraries received $250,000 from IMLS to develop programs for the next generation of scientists to enable them to find, organize and share data. The program is intended for graduate students in science working their way toward careers as research scientists. In 2012, technology makes it easier to share research data beyond the lab. In many cases, data is not administered in ways that enable it to be easily discovered, understood, or re-purposed by others. This training is vital to scientists as they look to secure research funding. The National Science Foundation issued a report in 2007 on the need to build public collections of research data; since 2011, it has required scientists to include data management plans in their grant applications.

The Data Information Literacy effort will be carried out over two-years by five teams. Two teams, consisting of a data librarian, subject librarian and faculty researcher, are based at Purdue, with one team each at the other institutions. Teams are constructed to represent various subjects from computer engineering to landscape architecture so commonalities and differences in data curation can be explored. Each team will conduct an assessment of data needs for their discipline, including interviewing and observing researchers. Teams will develop and implement targeted instruction and assess the impact of that instruction in developing the data information literacy skills of graduate students.

More information on the data information literacy project is available at http://wiki.lib.purdue.edu/display/ste

See also Indiana University-Purdue University Indianapolis. Data Services Program

References

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox