"...Content analysis is a research method used for making replicable and valid inferences from data to their context, with the purpose of providing knowledge, a representation of facts , new insights, and a practical guide to action." — Krippendorff, 1980
Content analysis (also textual analysis and even grounded theory) is a research method that can be used to examine (and quantify objectively) the presence of certain words, concepts, themes, phrases, characters, and sentences within a text or sets of texts. Texts may be defined as articles, books, chapters in books, interviews, discussions, historical documents, speeches, conversations, e-mail, or really any occurrence of communicative language. Content analysis is a commonly-used technique to examine data generated by social media technologies such as blogs, wikis and Twitter. Content analysis enables researchers to sift through large volumes of data with relative ease and in a systematic way. Krippendorff says that content analysis research is motivated by the search for techniques to infer what is too costly or too obtrusive to be accomplished with datasets using other techniques.
To conduct a content analysis, texts are coded, or broken down, into manageable categories on several levels: words, word sense, phrases, sentences, themes, and then examined using an established analytical method. Results are used to make inferences about the text(s), writer(s), audience and culture. Content analysis can indicate pertinent features such as comprehensiveness of coverage, bias, prejudice and author oversight as well as all other persons responsible for the content. In content analysis, there are two approaches to coding: emergent coding seeks to develop categories following preliminary examination of data. In a priori coding, categories are established prior to the analysis based upon a theory. Professional colleagues agree on the categories, and the coding is applied to the data. Revisions are made as necessary, and categories are refined to maximize mutual exclusivity and completeness.
In 1931, Lindesmith used content analysis to refute an existing hypothesis. The method was frequently referred to as grounded theory until the 1960s. Its purpose was to examine the frequency of keywords in texts to determine the most important structures of the writing in question. Today, content analysis is frequently used in all kinds of research to determine the most important aspects contained within texts. Establishing reliability is easy and straightforward in content analysis. Of all existing methods, CA scores highest with respect to ease of replication.
In 2006, Robinson noted that content analysis is an alternative technique in library and information science (LIS) research, but is too often ignored. She outlines the basic concepts in content analysis, and explores the possible reasons why it has had limited application in the LIS field.
Six questions addressed by content analysis
According to Krippendorff (1980 and 2004), six questions must be addressed in every content analysis:
Which data has been analyzed by the content analysis?
How is the data defined?
What is the population from which the data is drawn?
What is the context relative to how the data are analyzed?
What are the parameters or boundaries of the analysis?
What is the target of the inferences?
Ten (10) steps of content analysis
Read entire transcript; make notes in margins when interesting or relevant information is found
Go through notes in the margins and list different types of information found
Categorize each item, and description of what it is about
Identify whether categories can be linked; list them as major categories (or themes) and / or minor categories (or themes)
Compare and contrast major and minor categories
If there is more than one transcript, repeat the first five stages again
Aggregate categories and themes; examine each in detail and consider if it fits and its relevance
Categorize data into major categories/themes, review to ensure information is categorised accurately
Review categories and whether categories can be merged or if some need to be sub-categorised
Return to transcripts and ensure information needs to be categorize
The process of content analysis is lengthy and may require the researcher to go over and over the data to ensure they have done a thorough job.
What to look for in CA?
The researcher should give a clear description of the context, selection and characteristic of participants, data collection and process of analysis
The content analysis may comprise a conceptual analysis or relational analysis
content analysis has most often been thought of in terms of conceptual analysis; in conceptual analysis, a concept is chosen for examination, and the analysis involves quantifying its presence (also known as thematic analysis)
relational analysis, like conceptual analysis, begins with the act of identifying concepts present in a given text or set of texts; relational analysis however goes beyond by exploring the relationships between the concepts identified
A good content analysis is one where the researcher analyzes and simplifies data from categories that reflect the subject of study in a reliable manner
The categories should cover the data completely; it may be necessary to demonstrate the links between the data and results
Including appendices and tables may be useful to present the links and the results visually
Downe-Wamboldt describes content analysis as a research technique that provides systematic and objective means in order to describe and quantify phenomena; content analysis is more than a counting game; it is concerned with meanings, intentions, consequences and context.
As with other research methodologies, CA requires consideration of bias and in making assertions about the data, its meaning and and generalizability. It is important to reiterate the need for the training of coders and the assessment of reliability and validity
Human coders are used in content analysis. Neuendorf suggests that when coders are used in content analysis two coders should be used. Reliability of human coding is often measured using a statistical measure of intercoder reliability or "the amount of agreement or correspondence among two or more coders" (Neuendorf, 2002).