Are you interested in contributing to HLWIKI International? contact Grey data, like literature, can be hidden in the deep, social web
To browse other articles on a range of HSL topics, see the A-Z index.
- 6 October 2016
See also Big data | Bioinformatics | Data management portal | Grey literature | Grey information and data | Open data | Web 2.0
Grey data ("hard to find" data) has not been satisfactorily defined, but for some insight see my 2009 blogpost: Let’s Liberate ‘Grey Data’ & Grey Literature. Also, see Banks (2009) below.
- In the pre-2008 era, discussions of the grey characteristics of medical data took place within the context of grey systems theory (Xuerui, 2007) but this is not a direct corollary to grey literature. However, the notion of grey literature defined as hidden, difficult to aggregate, and non-commercial holds true for grey data.
- Augusto et al (2010) write about grey data as that which is "...contained in sources of information that are not directly accessible...".
- More recently, in the management literature, grey data is used to refer to information within an organization that is "unstructured" and "unused".
- Gelfand et Tsang (2015) discuss shades of grey data in "Data: is it grey, maligned or malignant: "...access to data is determined by those who can afford it, discover and know about it, and can thus manipulate it. [However] the new reality is that data is central to the work of science, social sciences and basic human conditions of health and wellbeing..."
- At its core, biomedicine is an information and data-intensive discipline - much of it hidden behind private networks, deep within websites and searchable databases (sometimes) over the web.
- Descriptive statistics in the literature are one example of how medical knowledge fills databases (and textbooks); an example of best evidence that research can provide, and that must be made accessible.
- At a clinical level, where does data reside? patient charts; patient histories; clinical trial records; print files yet to be digitized; electronic health records; research and grant proposals; research syntheses. The evidence base. And data. Loads ‘n loads of data from clinical trials and clinical observations.
- Data mining in medicine is a way to integrate the dark recesses of data in epidemiological and patient databases; data distillation is a means of describing systematic review processes particularly meta-analyses.
- Integration of knowledge systems is taking place; point of care and clinical decision-support tools (DynaMed and UpToDate) are key to assisting physicians in their clinical roles. But by synthesizing a sliver of available data – do they lose out on so much medical evidence? Ideally, shouldn’t data distillation integrate all available medical information and the grey data in the deepest recesses of the deep web?
- The future of medicine should be about liberating the literature but also the data. Epidemiological and pertinent clinical data collated from across all time zones, systems and databases. Emerging disease warning systems could be developed to deal with H1N1. Patterns in patient records would be more easily seen and described, and specific sub-populations treated, early. Think how this would save lives. (Think of the 1980s, and how lack of pattern recognition in patients led to the spread of AIDS.)
- One question is how will emerging genetic data fit into this large medical view of health? How can we integrate human genome information with MEDLINE and our other beloved databases. There are some who believe that the U.S. could have averted 9/11 if various security agencies had shared data beforehand. Did a lack of pattern recognition by authorities lead to 9/11?
- Recognizing patterns among disparate sources of medical information is critical to the future of human health. New forms of data distillation require better relational databases, massive networks that can process data quickly and efficiently, and data mining tools that recognize patterns across various populations and geographies.
- In Augusto et al (2010), grey data was culled from the grey literature according to methods adapted by Batt et al (2004) and Pullin & Stewart (2006). Grey data was found in university libraries, libraries or archives of research institutes, personal libraries of soil scientists, or from the web. The sources were research reports (n = 20 references; 46% of grey data), unused archives of research institutes (n = 3 sources of data; 38%) and national monitoring networks (n = 3; 16%). Grey data originated from work conducted in the second half of the 20th century.
What is grey data?
According to Gelfand et al (2015), types of grey data include:
- research data (in their processing stages, starting from the raw data set acquisition), textual and not-textual documents and virtual data representations
- contexts and their relationships, typical of the complexity of contemporary science, of its various actors and communities
- infrastructures, instruments, tools and ICT methods.
Searching for grey data
- Searches for grey literature can require substantial resources to undertake but their inclusion is vital for research activities such as systematic reviews.
- Web scraping, the extraction of patterned data from web pages on the internet, has been developed in some sectors for business but offers substantial benefits to those searching for grey literature.
- Building and sharing protocols that extract search results and other data from web pages can drastically increase transparency and resource efficiency.
- Various options exist in terms of web-scraping software
Twenty-five shades of grey (See Smart, 2015)
- White papers
- Working papers
- News reports
- Press releases
- Newspaper articles
- Comments on articles
- Meeting reports
- Position papers
- Research guides
- Research reports
- Lab reports
- Government documents
- Study reports
- Policy documents
- Policy statements
Value of grey data
- According to Augusto et al, "...[grey] data cannot be used directly on their own. In our [study], it was the combined use of grey data, field sampling, grey knowledge and scientific knowledge that enabled [our] evaluation".
Grey data websites
- A great deal of useful and relevant information on African groundwater is held in the form of reports, maps and datasets by institutions outside the African continent, such as European geological surveys. Unfortunately, much of this material is not published, or has only been published in limited quantities and so is now difficult to access. These materials, known as “Grey Data”, include unpublished books, reports, maps, notes and datasets which, whilst theoretically available, are in practice hard to obtain. Furthermore, much of this material is not in accessible formats - grey data is often found only as fragile paper copies, since much of the work was reported before the common use of computers.
- Adams J, Hillier-Brown FC, Moore HJ, Lake AA, Araujo-Soares V, White M, Summerbell C. Searching and synthesising 'grey literature' and 'grey information' in public health: critical reflections on three case studies. Syst Rev. 2016 Sep 29;5(1):164.
- Augusto L, Bakker M, Ranger J, et al. Is 'grey literature' a reliable source of data to characterize soils at the scale of a region? A case study in a maritime pine forest in southwestern France. Eur J Soil Sci. 2010;61(6):807-822.
- Banks M. Blog posts and tweets: the next frontier for grey literature. Web 2.0 content as “Grey Data”. Grey Literature in Library and Information Studies. De Gruyter, 2009.
- Batt K, Fox-Rushby JA, Castillo-Riquelme M. The costs, effects and cost-effectiveness of strategies to increase coverage of routine immunizations in low- and middle-income countries: systematic review of the grey literature. Bull World Health Organ. 2004 Sep;82(9):689-96.
- Cobbing JE, Davies J. Improving access to southern Africa’s groundwater “grey data”. Hydrogeology J. 2011;19(6):1117-1120.
- EU grey literature: long-term preservation, access, and discovery.
- Falconer L, Hoel H. Occupational safety and health: a method to test the collection of 'grey data' by line managers. Occup Med (Lond). 1997 Feb;47(2):81-9.
- Gelfand JM, Tsang DC. Data: is it grey, maligned or malignant? Grey Journal. 2015;11(1). UC Irvine. http://escholarship.org/uc/item/80w006rz
- Goggi S, Monachini M, Frontini F, Bartolini R, Pardelli G, Manzella G, et al. Marine planning and service platform (MAPS): an advanced research engine for grey literature in marine science. Grey Journal (TGJ). 2015;11(3): 171-178.
- Mahood Q, Eerd DV, Irvin E. Searching for grey literature for systematic reviews: challenges and benefits. Res Synth Method. 2014
- Motta G, Puccinelli R, Reggiani L, Saccone M. Extracting value from grey literature: processes and technologies for aggregating and analyzing the hidden "big data" treasure of organizations. Grey Journal (TGJ). 2016 Mar 1;12(1).
- Pullin AS, Stewart GB. Guidelines for systematic review in conservation and environmental management. Conserv Biol. 2006;20(6):1647-1656.
- Saleh et al. Grey literature: searching for health sciences systematic reviews: a prospective study of time spent and resources utilized. Evidence Based Library and Information Practice. 2014;9(3).
- Smart P. Twenty‐five shades of grey. Learned Publishing. 2015;28(3):163-165.
- Tattersall A, Grant MJ. Big Data - What is it and why it matters. Health Info Libr J. 2016 Jun;33(2):89-91.
- Xuerui T, Julong D, Hongxing P, Sifeng L. Grey system and grey data management in medicine. In: IEEE International Conference on Grey Systems and Intelligent Services, 2007 Nov 18.