"...text mining is the indexing of content. Words that are part of a fixed vocabulary are found within a text and extracted to create an index that shows where in the text each word was found. The index can be used in the traditional way to locate the parts of the text that contain those words. The index can also be used as a database and analysed to discover patterns: for example, how often certain words occur. In simple terms, text mining is the process that turns text into data that can be analysed." — Clark, 2013
Text-mining (i.e., data mining, content-mining) refers to a process of discovering and extracting text-related content from unstructured, miscellaneous data. Text-mining is often mentioned in the context of several information-age trends such as big data, bioinformatics, data curation, e-Science and the semantic web. Currently, there are a number of social media monitoring tools that perform various types of text-mining activities. In 2013, the US Government announced that it extracts data from the e-mails and telephone calls of American citizens, referring to this process (which includes text-mining) as their metadata program.
Typically, text-mining comprises three major activities: 1) information retrieval (IR) to gather relevant unstructured text among heterogeneous databases, documents and websites, 2) information extraction (IE) to identify and extract entities, facts and relationships among those entities, and 3) data-mining to find associations among the information extracted in the various texts located. The goal of text-mining is to extract and discover knowledge hidden in text by identifying concepts, extracting facts/relationships in texts, discovering implicit links and generating hypotheses. One of the main reasons text-mining may be important is to deal with information overload created by blogs, wikis, clinical data, surveys, heterogeneous databases and the web. Text-mining is especially useful in areas where large collections of data and information in documents are located. Some of the scientific applications have been developed because of text-mining are drug discovery applications, predictive toxicology, competitive intelligence, patent searching, and so on.
Other reasons why text-mining may be important are:
Biomedical science is inundated with data, datasets and information of various kinds
Much of the information is in an unstructured format (text)
There are as many text types, genres, domains as there are documents
Some of the information is in a semi-structured format (XML + text)
Some of the information is in a structured format (databases)
Biomedical science researchers need to make sense of data
Biomedical researchers and health librarians need to manage this information and knowledge effectively
Text-mining can be used to improve indexing which is essential for findability; however, text-mining can create indexes more efficiently because it is machine-aided indexing
The rise of data and its concomitant uses, curation and management, is a growing trend in academic libraries. However, rather than wait for your library organization to hire a data librarian or to create a data repository, why not try to introduce some data science skills (or exercises) into your library workshops?
First, how might you start to incorporate data science concepts into your information literacy programs?
Brainstorm and (re)write definitions, models and standards for our programs to include data
Develop discipline-based frameworks for information and data literacy
How should academic libraries provide data literacy education?
Should workshops be designed as standalone or integrated into courses?
Should they be part of research methods, theory courses or integrated across curricula?
Who should teach and support data literacy?
Data librarians, academic domain experts, LIS academics
Other subject experts
Benefits of text-mining
Text-mining can aid in systematically reviewing a large body of literature
Text-mining can help researchers keep up in their fields, reducing the risk they've missed something relevant
Text-mining aids in the discovery of patterns and trends in data, associations among entities, predictive rules, etc.
Text-mining has the ability to enrich unstructured text with semantic tags and annotations (i.e., seeFOAF - Friend of a Friend)
Text-mining assists authors with tools to develop semantic annotations
Text-mining is a form of document and information management
Paynter RA, Bañez LL, Berliner E, Erinoff E, Lege-Matsuura J, Potter S, Uhl S. EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews. Research White Paper. (Prepared by the Scientific Resource Center and the Vanderbilt and ECRI Evidence-based Practice Centers under Contract Nos. 290-2012-00004-C [SRC], 290-2012-00009-I [Vanderbilt], and 290-2012-00011-I [ECRI].) AHRQ Publication 16-EHC023-EF. Rockville, MD: Agency for Healthcare Research and Quality; April 2016.