Grey data ("hard to find" data)

From HLWIKI Canada
Jump to: navigation, search
Grey data, like literature, can be hidden in the deep, social web
Are you interested in contributing to HLWIKI International? contact:

To browse other articles on a range of HSL topics, see the A-Z index.


Last Update

  • Updated.jpg 6 October 2016


See also Big data | Bioinformatics | Data management portal | Grey literature | Grey information and data | Open data | Web 2.0

Grey data ("hard to find" data) has not been satisfactorily defined, but for some insight see my 2009 blogpost: Let’s Liberate ‘Grey Data’ & Grey Literature. Also, see Banks (2009) below.

  • In the pre-2008 era, discussions of the grey characteristics of medical data took place within the context of grey systems theory (Xuerui, 2007) but this is not a direct corollary to grey literature. However, the notion of grey literature defined as hidden, difficult to aggregate, and non-commercial holds true for grey data.
  • Augusto et al (2010) write about grey data as that which is "...contained in sources of information that are not directly accessible...".
  • More recently, in the management literature, grey data is used to refer to information within an organization that is "unstructured" and "unused".
  • Gelfand et Tsang (2015) discuss shades of grey data in "Data: is it grey, maligned or malignant: "...access to data is determined by those who can afford it, discover and know about it, and can thus manipulate it. [However] the new reality is that data is central to the work of science, social sciences and basic human conditions of health and wellbeing..."
  • At its core, biomedicine is an information and data-intensive discipline - much of it hidden behind private networks, deep within websites and searchable databases (sometimes) over the web.
  • Descriptive statistics in the literature are one example of how medical knowledge fills databases (and textbooks); an example of best evidence that research can provide, and that must be made accessible.
  • At a clinical level, where does data reside? patient charts; patient histories; clinical trial records; print files yet to be digitized; electronic health records; research and grant proposals; research syntheses. The evidence base. And data. Loads ‘n loads of data from clinical trials and clinical observations.
  • Data mining in medicine is a way to integrate the dark recesses of data in epidemiological and patient databases; data distillation is a means of describing systematic review processes particularly meta-analyses.
  • Integration of knowledge systems is taking place; point of care and clinical decision-support tools (DynaMed and UpToDate) are key to assisting physicians in their clinical roles. But by synthesizing a sliver of available data – do they lose out on so much medical evidence? Ideally, shouldn’t data distillation integrate all available medical information and the grey data in the deepest recesses of the deep web?
  • The future of medicine should be about liberating the literature but also the data. Epidemiological and pertinent clinical data collated from across all time zones, systems and databases. Emerging disease warning systems could be developed to deal with H1N1. Patterns in patient records would be more easily seen and described, and specific sub-populations treated, early. Think how this would save lives. (Think of the 1980s, and how lack of pattern recognition in patients led to the spread of AIDS.)
  • One question is how will emerging genetic data fit into this large medical view of health? How can we integrate human genome information with MEDLINE and our other beloved databases. There are some who believe that the U.S. could have averted 9/11 if various security agencies had shared data beforehand. Did a lack of pattern recognition by authorities lead to 9/11?
  • Recognizing patterns among disparate sources of medical information is critical to the future of human health. New forms of data distillation require better relational databases, massive networks that can process data quickly and efficiently, and data mining tools that recognize patterns across various populations and geographies.
  • In Augusto et al (2010), grey data was culled from the grey literature according to methods adapted by Batt et al (2004) and Pullin & Stewart (2006). Grey data was found in university libraries, libraries or archives of research institutes, personal libraries of soil scientists, or from the web. The sources were research reports (n = 20 references; 46% of grey data), unused archives of research institutes (n = 3 sources of data; 38%) and national monitoring networks (n = 3; 16%). Grey data originated from work conducted in the second half of the 20th century.

What is grey data?

According to Gelfand et al (2015), types of grey data include:

  • research data (in their processing stages, starting from the raw data set acquisition), textual and not-textual documents and virtual data representations
  • contexts and their relationships, typical of the complexity of contemporary science, of its various actors and communities
  • infrastructures, instruments, tools and ICT methods.

Searching for grey data

  • Searches for grey literature can require substantial resources to undertake but their inclusion is vital for research activities such as systematic reviews.
  • Web scraping, the extraction of patterned data from web pages on the internet, has been developed in some sectors for business but offers substantial benefits to those searching for grey literature.
  • Building and sharing protocols that extract search results and other data from web pages can drastically increase transparency and resource efficiency.
  • Various options exist in terms of web-scraping software

Twenty-five shades of grey (See Smart, 2015)

  1. White papers
  2. Working papers
  3. Theses
  4. Dissertations
  5. News reports
  6. Guidelines
  7. Press releases
  8. Newspaper articles
  9. Websites
  10. Blogs
  11. Twitter
  12. Comments on articles
  13. Presentations
  14. Interviews
  15. Meeting reports
  16. Position papers
  17. Announcements
  18. Research guides
  19. Research reports
  20. Lab reports
  21. Government documents
  22. Study reports
  23. Policy documents
  24. Policy statements
  25. Transcriptions

Value of grey data

  • According to Augusto et al, "...[grey] data cannot be used directly on their own. In our [study], it was the combined use of grey data, field sampling, grey knowledge and scientific knowledge that enabled [our] evaluation".

Grey data websites

  • A great deal of useful and relevant information on African groundwater is held in the form of reports, maps and datasets by institutions outside the African continent, such as European geological surveys. Unfortunately, much of this material is not published, or has only been published in limited quantities and so is now difficult to access. These materials, known as “Grey Data”, include unpublished books, reports, maps, notes and datasets which, whilst theoretically available, are in practice hard to obtain. Furthermore, much of this material is not in accessible formats - grey data is often found only as fragile paper copies, since much of the work was reported before the common use of computers.


Personal tools