|Are you interested in contributing to HLWIKI International – hlwiki.ca? contact
To browse other articles on a range of HSL topics, see the A-Z index.
- 17 June 2013
See also Bioinformatics | Data science portal | Data visualization | e-Science | Open data | Research Portal for Academic Librarians | Semantic web | Text-mining
- "... data curation is the "active and on-going management of data through its lifecycle of interest and usefulness to scholarly and educational activities across the sciences, social sciences, and humanities ... it is an emerging field that brings new opportunities and challenges for libraries. The growing movement to effectively manage, archive, preserve, retrieve and reuse research data is one that compliments traditional library missions ..." — CIRSS, Data Curation Education Program 2012
Data management (also data curation and data science) refers to maintaining the accessibility, storage and preservation of data. Authors writing in the field suggest data curation involves the selection and appraisal of information and may deal with issues such as evolving provision of intellectual access, redundant storage, data transformation and, for some materials, a commitment to preservation. According to Stuart (2010) "...if we are going to continue to be relevant in the age of Google and Google Scholar, we need to move beyond the document and facilitate access to the increasing amounts of data on the web. ..." Academic libraries are taking more responsibility to coordinate data and naming it as part of their long-term institutional mandate. Much global research in health and medicine, including clinical trials data, is born-digital and increasingly accessed by computing power. Will research data available for analysis by other researchers, how might research data be used for other data mining purposes?
The Fourth Paradigm is a term connected to e-data. This is the vision of pioneering computer scientist Jim Gray for a new fourth paradigm of discovery based on data-intensive science; the extensive monograph offers insights into how it can be fully realized. To take a look at a guide primarily geared toward researchers and data librarians, see here.
In 2013, it was announced that Wikidata, an offshoot of Wikipedia, and centralized repository for data and facts, now feeds information for Wikipedia.
What do we mean by research data and data curation?
Data has enormous value if managed well, and made accessible. Research data may be defined as the information (e.g. data sets, microarray, numerical data, clinical trial information, textual records, images, sound, etc.) generated or used as quantitative evidence in primary biomedical research. This research data is distinguished by the fact that it is accepted by the research community as a means to validate research findings, observations and hypotheses. According to CARL/ABRC, the majority of research data produced by academic institutions in Canada is not being properly or systematically archived in repositories. This suggests that a more concerted effort is needed to bring together experts at Canadian academic institutions to initiate data management projects. A study conducted by the Social Sciences and Humanities Research Council (SSHRC) found that 3 Canadian organizations out of 110 systematically archive data - of those, all were archived in the US. Research data generated in Canada is not managed properly and much of it is under-utilized or inaccessible. While some disciplines and research areas have institutional, national and international supports for data curation, this support is neither comprehensive nor well-known.
Canadian projects & websites
- DataCite Canada, NRC is an international collaboration to improve access to research data by enabling organizations to register datasets and digital object identifiers (DOIs). Research data is defined as any research output that has not been published before such as raw data, slide presentations, lab notes, etc. CISTI is responsible for assigning unique identifiers for Canadian data sets; however, CISTI is not ready to accept data sets; it does plan to assign DOIs to data and work with data centres in Canada interested in participating in DataCite.
Data management courses
- "Data literacy must also include the ability to do something with raw information - to process it in some way. In an era where spreadsheets help us to make the grandest of decisions, we must have basic statistical literacy and fluency in the tools that allow us to make sense out of numerical data, not just words and ideas." ~ Johnson, "The Information Diet: A Case for Conscious Consumption"
- Khan Academy. Statistics. basics of reading and interpreting data; descriptive and inferential statistics covered in an introductory course
International projects & websites
4 stars denotes librarian-selected, high quality information. Starred sites are great places to begin your research.
Managing data is central to health care
- CSAIL looks at the issue of big data as "fundamentally multi-disciplinary"; the MIT team includes faculty and researchers across related technology areas, including algorithms, architecture, data management, machine learning, privacy and security, user interfaces, and visualization; as well as domain experts in finance, medical, smart infrastructure, education and science
- Databib is a tool for helping people identify and locate online repositories of research data
- helping you to find, access and use data
- DataCite Canada's services are offered in cooperation with DataCite, an international consortium of national-scale libraries and research organizations committed to increasing access to research data on the Internet
- DataCite Canada is DataCite's DOI allocation agent for Canada
- DataCite promotes the value of data archiving, citation and discoverability within Canada
- table lists NIH-supported data repositories that accept submissions of appropriate data from NIH-funded investigators (and others). Also included are resources that aggregate information about biomedical data and information sharing systems
- DMPTool adheres to National Institutes of Health (NIH) data sharing requirements
- DMPTool provides step-by-step guidance to help users create ready-to-use data management plans and meet funder data management requirements. While anyone can create an account and use this resource, many institutions have partnered with the DMPTool to allow login through their home institution, and, in some cases have provided customized help and support
- Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Dryad enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies
- a collaborative project devoted to educating science and medical librarians on e-Science, the portal was initiated at the University of Massachusetts Medical School through funding from the National Network of Libraries of Medicine
- a vision of pioneering computer scientist Jim Gray for a new fourth paradigm of discovery based on data-intensive science; this extensive monograph offers insights into how it can be fully realized
- U.S. federal government initiatives to make data more accessible for monitoring, assessment and policy development
- access to high quality data improves understanding of a community’s health status and determinants
- provide a single, user-friendly, source for national, state, and community health indicators
- minimize duplication of effort in provision of digital preservation training and education programmes
- describe, promote and contextualize current training and education offerings
- identify and exploit collaborative training and education opportunities
- maximize inter-disciplinary training and education opportunities
- develop a shared digital preservation training infrastructure to enable reuse of training and education materials
- ensure synergy and complementarity between emerging curation and preservation education programmes with professional development training courses
- a research and teaching unit at Harvard University dedicated to exploring and expanding the frontiers of networked culture in the arts and humanities
- a social web site for researchers sharing research objects such as scientific workflows
- aims to solve name ambiguity problem in scholarly communications by creating a registry of persistent unique identifiers for individual researchers and an open and transparent linking mechanism between ORCID, other ID schemes, and research objects such as publications, grants, and patents
- aimed at helping researchers share biomedical data and models; PhysiomeSpace has just completed its beta implementation and is open to users
- centralized, standards compliant, public repository for proteomics data; developed to provide proteomics community with a repository for protein and peptide identification with evidence supporting it; details of post-translational modifications coordinated relative to peptides in which they have been found also
- Need to create a data plan for a grant proposal? Find out what to include & see examples.
- Wolfram Alpha provides access to a world of factual data, without searching, calling itself the first computational knowledge engine. On the web, there is increased emphasis on repositories of data maintained by national or international agencies, organizations and individuals. Wolfram Alpha now hosts the Wolfram Data Summit to bring together those responsible for data repositories and to develop innovative concepts for the future.
- provide all users with improved access to World Bank data and to make that data easy to find and use
Data Information Literacy at Purdue
In partnership with librarians at the University of Minnesota, University of Oregon and Cornell University, the Purdue University Libraries received $250,000 from IMLS to develop programs for the next generation of scientists to enable them to find, organize and share data. The program is intended for graduate students in science working their way toward careers as research scientists. In 2012, technology makes it easier to share research data beyond the lab. In many cases, data is not administered in ways that enable it to be easily discovered, understood, or re-purposed by others. This training is vital to scientists as they look to secure research funding. The National Science Foundation issued a report in 2007 on the need to build public collections of research data; since 2011, it has required scientists to include data management plans in their grant applications.
The Data Information Literacy effort will be carried out over two-years by five teams. Two teams, consisting of a data librarian, subject librarian and faculty researcher, are based at Purdue, with one team each at the other institutions. Teams are constructed to represent various subjects from computer engineering to landscape architecture so commonalities and differences in data curation can be explored. Each team will conduct an assessment of data needs for their discipline, including interviewing and observing researchers. Teams will develop and implement targeted instruction and assess the impact of that instruction in developing the data information literacy skills of graduate students.
More information on the data information literacy project is available at http://wiki.lib.purdue.edu/display/ste
See also Indiana University-Purdue University Indianapolis. Data Services Program
- ACRL Academic Libraries and Research Data Services: current practices and plans for the future. An ACRL White Paper, 2012. Carol Tenopir, Ben Birch, Suzie Allard.
- ALA Connect. The fourth paradigm: data-intensive research, digital scholarship and implications for libraries
- Bailey CW. Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012.
- Baker K, Yarmey L. Data stewardship: environmental data curation and a web-of-repositories. Int J Digital Curation. 2009;4(2).
- Ball A. Review of the state of the art of the digital curation of research data. ERIM Project. University of Bath; 2010.
- Beagrie N. Digital preservation: setting the course for a decade of change. 2007.
- Canadian Association of Research Libraries. Research data: unseen opportunities an awareness toolkit commissioned by CARL; 2009.
- Chalmers I, Altman DG, McHaffie H, Owens N, Cooke RWI. Data sharing among data monitoring committees and responsibilities to patients and science. Trials. 2013;14:102.
- Cox A, Verbaan E, Sen B. Upskilling liaison librarians for research data management. Ariadne. 2012;70.
- Cragin MH, Palmer CL, Heidorn PB, Smith LC. An educational program on data curation. American Library Assocation, Science and Technology Section; 2007.
- Cragin MH, Palmer CL, Heidorn PB. Extending the data curation curriculum to practicing LIS professionals. DigCCurr2009: Digital Curation: Practice, Promise & Prospects; 2009.
- De Roure D, Goble C, Aleksejevs S. The myExperiment Open Repository for scientific workflows. Open Repositories. 2009.
- Delserone LM. At the watershed: preparing for research data management and stewardship at the University of Minnesota Libraries. Library Trends. 2008;57(2):202–210.
- Giarlo MJ. Academic libraries as data quality hubs. J Libr Scholarly Commun. 2013;1(3):eP1059.
- Gore SA. e-Science and data management resources on the web. Med Ref Serv Q. 2011;30(2):167–77.
- Heidorn PB. The emerging role of libraries in data curation and e-science. J Libr Admin. 2011;51(7-8):662–672.
- Hey T, Tansley S, Tolle K. The fourth paradigm: data-intensive scientific discovery. Microsoft Research. Redmond, Washington, 2009.
- Humphrey C. Preserving research data: a time for action. In: Canadian Conservation Institute. Preservation of electronic records: new knowledge and decisionmaking. Ottawa, ON: 2004.
- Interagency Working Group on Digital Data. Science of the National Science and Technology Council. Harnessing the Power of Digital Data for Science and Society. Washington, DC; 2009.
- Lewis SC, Rodrigo Z, Hermida A. Content analysis in an era of big data: a hybrid approach to computational and manual methods. J Broadcast Elec Media. 2013;57(1):34-52.
- LIBER Working Group. Ten recommendations for libraries to get started with research data management. Final Report on E-Science, 2012.
- Mallon M. Data curation. Public Services Q. 2012;8(4) :326-337.
- Martin R. What do data services librarians do? J eSci Libr. 2012;1(3):Article 3.
- Miller HE. Big-data in cloud computing: a taxonomy of risks. Info Res. 2013;18(1):paper 571.
- Ohno-Machado L. A hybrid open-access model to bridge the publishing divide and reach out to a broader community. JAMIA. 2011;18(3):210–1.
- Piwowar HA, Day RS, Fridsma DB. Sharing detailed research data is associated with increased citation rate. PLoS ONE. 2007;2(3):e3082.
- Rani M, Buckley BS. Systematic archiving and access to health research data: rationale, current status and way forward. Bull World Health Organ. 2012;90:932–939.
- Rothenberg J. Ensuring the longevity of digital documents. Sci Am. 1995;272(1):24-29.
- Rusbridge C. Tomorrow, and tomorrow, and tomorrow: poor players on the digital curation stage. In: Digital convergence – libraries of the future. London: Springer; 2008.
- Scaramozzino JM, Ramirez M, McGaughey K. Managing the data deluge: understanding scientists' need for data curation services; 2010.
- Research Data Strategy Working Group. Stewardship of research data in Canada: a gap analysis; 2008.
- Ross JS, Krumholz HM. Ushering in a New Era of Open Science Through Data Sharing. JAMA. 2013;():1-2.
- Simons N. Implementing DOIs for research data. D-Lib Magazine. May/June 2012;18(5/6).
- Stahl-Timmins W. Information graphics in health technology assessment. PhD thesis, University of Exeter, UK. 2011.
- Stuart D. Programming skills could transform librarians' roles. Research Information; 2010.
- Tenopir C, Sandusky RJ, Allard S, Birch B. Academic librarians and research data services: preparation and attitudes IFLA Journal. March 2013;39:70-78.
- Walters TO. Data curation program development in US universities: the Georgia Institute of Technology example. Int J Digital Curation. 2009;4(3):83–92.