Statistics for academic librarians
To browse other articles on a range of HSL topics, see the A-Z index.
Statistics is a branch of mathematics that transforms data into useful information for decision-makers. It is also a field of study that deals with the summary and interpretation of data, large or small. Statisticians are educated in the science of statistical analysis, and may hold advance degrees in statistics, mathematics or a specialty area. Wikipedia states that, "...Statistical literacy is the ability to understand statistics. Statistical literacy is necessary for citizens to understand material presented in publications such as newspapers, television, and the Internet. Numeracy is a prerequisite to being statistically literate."
In health and medicine, a biostatistician uses statistics to compile data analysis from clinical trials (or other studies) to assist researchers in identifying patterns of beneficial treatments and to determine best methods of treatment for populations of patients. To understand the application of statistics to medicine consider biostatistics. Begin with a patient population, problem or process. Examine, study or measure some aspect of that population or problem. This investigation may entail an analysis or interpretation of clinical data, or taking steps to collect it using a variety of survey instruments or other tools.
Statistics are gathered by librarians and archivists in a number of ways. In libraries, statistics are typically generated as individual users take out books or pass through metered gates. Other statistics are generated by interactions with library staff or through the use of print and electronic library collections and information services. Aggregated statistics can also reveal trends within organizations or a group of organizations. Statistics in these contexts may refer to something in the singular or plural sense. In the singular, statistics refers to mathematics or tabulation. In the plural, it refers to a quantity (such as a mean) calculated from a set of data. One part of the work of most librarians is to keep statistics on their work. So why should librarians be aware of statistical concepts writ large? The objective of collecting statistics in libraries is to "assess the quality and effectiveness of services [and resources] provided by the library" (Poll, 2001). Librarians are increasingly required to take account of their teaching, collection and in-house collections usage such as numbers of books acquired, journal subscriptions, reference questions, users entering the Library, size of acquisitions budget, etc.
What do librarians do with these statistics? Many user groups and populations within a library community are diverse. Statements about user groups may be framed as "all persons that use the medical library X" or "every patron signing out books from library Y". Further, these populations can be composed of observations of processes, actual measurable counts of their visits and serves as a way of examining their usage of library services. Data collected about "populations" is called a time series in statistics. For practical purposes, a subset of a specific user population in a library community may be called a sample — as opposed to an entire group (which might be called a census). Once a sampling technique is used, it is sometimes thought of being representative of the whole especially where that data is collected in observational or experimental settings. The data is subjected to statistical analysis and categorized as either descriptive or inferential.
Descriptive and inferential statistics
The field of statistics is often divided into two broad categories: descriptive statistics and inferential statistics.
According to Wikipedia "...descriptive statistics is the discipline of quantitatively describing the main features of a collection of data..." Descriptive statistics describe a set of data or numeric elements which comprise information about a population or event. No attempt is made to infer from these statistics or to predict what the data might mean in descriptive statistics. In other words, you simply present the data or information you have gathered and explain what it reveals. Similarly, descriptive statistics describe populations (often a sample group) and any measurements that have been gathered about that population. Together with graphical analysis, descriptive statistics form the basis of almost every quantitative analysis of data. Descriptive statistics are distinguished from inferential statistics because they describe what has been gathered, they do not infer. An example of the use of descriptive statistics occurs in drug studies. In papers discussing human subjects, there are tables that describe the sample size in subgroups (e.g. treatment or exposure groups), and demographic or clinical characteristics such as average age, proportion of subjects in each gender and proportion of subjects with related comorbidities.
In research involving comparisons of two groups (in medicine, the comparison might be one group getting a new drug, the other a placebo), the emphasis is on the significance level for the hypothesis that the two groups examined or tested differ to a greater degree than would be expected by "chance". If the group that gets a new drug benefits, those improvements in the disease or illness will be measured against the control group who gets the placebo. This difference in significance is represented as a p-value or as a standard score of a test statistic. In contrast, an effect size is a descriptive statistic that conveys the estimated magnitude and direction of the difference between groups. This is without regard to whether the difference is statistically significant. Reporting significance levels without effect sizes is criticized because large samples with small effects can be highly significant statistically. For further background on descriptive statistics, see the Khan Academy channel on descriptive statistics.
Inferential statistics (also inductive) refers to information regarding a sample of subjects in order to 1) make assumptions about the population at large and/or 2) make predictions about what might happen in the future. In the area of inferential statistics, conclusions are made about a population that may go beyond what is obvious or revealed by the data itself. Inferential statistics allows us to draw conclusions from a set of data. For instance, inferential statistics are used to infer from sample data what a population might be thinking about something. Or, inferential statistics are used to make judgments of the probability that an observed difference between groups is dependent on something happening, or might have happened by chance in this study. Inferential statistics are used to make inferences from the data collected and used to describe what is probably happening. Most inferential statistics derive from a family of statistical models known as the General Linear Model, and includes the t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), regression analysis, and multivariate methods like factor analysis, multidimensional scaling, cluster analysis, etc. It's a good idea for researchers to become familiar with GLM. Its discussion here is simplified and considers simple straight-line models only.
Key tools & websites