Data sets (websites)

From HLWIKI Canada
Jump to: navigation, search
Are you interested in contributing to HLWIKI International? contact:

To browse other articles on a range of HSL topics, see the A-Z index.


Last Update

  • Updated.jpg This entry is out of date, and will not be updated, July 2017


See also Bioinformatics | Data management portal | Data visualization | e-Science | Open data | Research Portal for Academic Librarians | Semantic web | Text-mining

International projects & websites

4-star.gif 4 stars denotes librarian-selected, high quality information. Starred sites are great places to begin your research.
Managing data is central to health care
  • CSAIL looks at the issue of big data as "fundamentally multi-disciplinary"; the MIT team includes faculty and researchers across related technology areas, including algorithms, architecture, data management, machine learning, privacy and security, user interfaces, and visualization; as well as domain experts in finance, medical, smart infrastructure, education and science
  • RDCs (Research data centres) provide short descriptions and details about data sets available at the RDC. The program provides analytical and methodological research tools to assist researchers.
  • Databib is a tool for helping people identify and locate online repositories of research data
  • helping you to find, access and use data
  • DataCite Canada's services are offered in cooperation with DataCite, an international consortium of national-scale libraries and research organizations committed to increasing access to research data on the Internet
  • DataCite Canada is DataCite's DOI allocation agent for Canada
  • DataCite promotes the value of data archiving, citation and discoverability within Canada
  • table lists NIH-supported data repositories that accept submissions of appropriate data from NIH-funded investigators (and others). Also included are resources that aggregate information about biomedical data and information sharing systems
  • DMPTool adheres to National Institutes of Health (NIH) data sharing requirements
  • DMPTool provides step-by-step guidance to help users create ready-to-use data management plans and meet funder data management requirements. While anyone can create an account and use this resource, many institutions have partnered with the DMPTool to allow login through their home institution, and, in some cases have provided customized help and support
  • Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. Dryad enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies. Dryad also aims to make data archiving as simple as possible via a suite of services not necessarily provided by publishers or institutional websites.
  • a collaborative project devoted to educating science and medical librarians on e-Science, the portal was initiated at the University of Massachusetts Medical School through funding from the National Network of Libraries of Medicine
  • a vision of pioneering computer scientist Jim Gray for a new fourth paradigm of discovery based on data-intensive science; this extensive monograph offers insights into how it can be fully realized
  • U.S. federal government initiatives to make data more accessible for monitoring, assessment and policy development
  • access to high quality data improves understanding of a community’s health status and determinants
  • provide a single, user-friendly, source for national, state, and community health indicators
  • minimize duplication of effort in provision of digital preservation training and education programmes
  • describe, promote and contextualize current training and education offerings
  • identify and exploit collaborative training and education opportunities
  • maximize inter-disciplinary training and education opportunities
  • develop a shared digital preservation training infrastructure to enable reuse of training and education materials
  • ensure synergy and complementarity between emerging curation and preservation education programmes with professional development training courses
  • a research and teaching unit at Harvard University dedicated to exploring and expanding the frontiers of networked culture in the arts and humanities
  • a social web site for researchers sharing research objects such as scientific workflows
  • aims to solve name ambiguity problem in scholarly communications by creating a registry of persistent unique identifiers for individual researchers and an open and transparent linking mechanism between ORCID, other ID schemes, and research objects such as publications, grants, and patents
  • aimed at helping researchers share biomedical data and models; PhysiomeSpace has just completed its beta implementation and is open to users
Data Curation Continuum (Treloar, 2007)
  • centralized, standards compliant, public repository for proteomics data; developed to provide proteomics community with a repository for protein and peptide identification with evidence supporting it; details of post-translational modifications coordinated relative to peptides in which they have been found also
  • an excellent pathfinder at Tulane University for American public health data sets
  • selective links representing a sample of available information. Items are selected for their quality, authority of authorship, uniqueness, and appropriateness.
  • Need to create a data plan for a grant proposal? Find out what to include & see examples.
  • Wolfram Alpha provides access to a world of factual data, without searching, calling itself the first computational knowledge engine. On the web, there is increased emphasis on repositories of data maintained by national or international agencies, organizations and individuals. Wolfram Alpha now hosts the Wolfram Data Summit to bring together those responsible for data repositories and to develop innovative concepts for the future.
  • provide all users with improved access to World Bank data and to make that data easy to find and use

Data Information Literacy at Purdue Us flag.jpg

In partnership with librarians at the University of Minnesota, University of Oregon and Cornell University, the Purdue University Libraries received $250,000 from IMLS to develop programs for the next generation of scientists to enable them to find, organize and share data. The program is intended for graduate students in science working their way toward careers as research scientists. In 2012, technology makes it easier to share research data beyond the lab. In many cases, data is not administered in ways that enable it to be easily discovered, understood, or re-purposed by others. This training is vital to scientists as they look to secure research funding. The National Science Foundation issued a report in 2007 on the need to build public collections of research data; since 2011, it has required scientists to include data management plans in their grant applications.

The Data Information Literacy effort will be carried out over two-years by five teams. Two teams, consisting of a data librarian, subject librarian and faculty researcher, are based at Purdue, with one team each at the other institutions. Teams are constructed to represent various subjects from computer engineering to landscape architecture so commonalities and differences in data curation can be explored. Each team will conduct an assessment of data needs for their discipline, including interviewing and observing researchers. Teams will develop and implement targeted instruction and assess the impact of that instruction in developing the data information literacy skills of graduate students.

More information on the data information literacy project is available at

See also Indiana University-Purdue University Indianapolis. Data Services Program

Data storage costs and data curation in libraries

  • Purdue’s pricing:
  • Princeton’s pricing:
  • The 4C project announced the beta version of the Curation Costs Exchange (CCEx) website. CCEx is an online community platform for the exchange of curation cost information. The goal is to help organizations make smarter investments in digital curation by enabling knowledge transfer and cost comparisons between organizations of all types. The value of the project will depend on the willingness to share cost data and on benefits that sharing will bring about. CCEx is a crowd-sourced database and library of curation cost information. It uses costs data to provide automatic generation of results for self-assessment, cost comparisons with peers and insights into the financial accounting and activity of other organizations. 4C Project’s vision is to create a better understanding of digital curation costs through collaboration.


Personal tools