Statistics for academic librarians

From HLWIKI Canada

Jump to: navigation, search
Are you interested in contributing to HLWIKI Canada - hlwiki.ca? contact: dean.giustini@ubc.ca

To browse other articles on a range of HSL topics, see the wiki index.

Contents

Introduction

See also LibGuides from Springshare | Social network analysis | Statistics in health | Research for librarians - portal

Statistics is a field of study concerned with summarizing and interpreting data, and making decisions based on data. Statistics involves all of those things but may be generated, within the context of libraries, by individuals and their use of services. Statistics can also be used in the singular or plural sense. In its singular sense, it refers to mathematical sciences; as a plural, it refers to a quantity (such as a mean) calculated from a set of data.

To apply statistics, it is necessary to begin with a population, problem or process -- and to study or measure some aspect of it. This might entail the collection, analysis and interpretation of data as well as its planning and collection - especially design of survey instruments and experiments. Why do academic librarians need to be aware of statistical techniques? The objective of collecting library statistics is to "assess the quality and effectiveness of services [and resources] provided by the library" (Poll, p.307). Academic librarians, for example, are required to take account of the success of their teaching, collection and in-house use -- numbers of books acquired, journal subscriptions, reference questions, users entering the Library, size of acquisitions budget, etc. But what do we do with these statistics? Many populations are diverse and include "all persons that use medical library X" or "every patron signing out books from library Y". A population can be composed of observations of processes, actual measurable counts of users and serves as a way of examining a population's use of a library service. For example, data collected about a "population" is called a time series.

For practical reasons, a subset of a population is called a sample — as opposed to an entire group (called a census). Once sampling is done, it is representative of the whole and data is collected in an observational or experimental setting. This data is put through statistical analysis and is called descriptive or inferential.

A statistician is educated in the science of successful application of statistical analysis.

Descriptive statistics

An example of the use of descriptive statistics occurs in drug studies. In papers reporting human subjects, there is typically a table that states the sample size in subgroups (e.g. treatment or exposure groups), and demographic or clinical characteristics such as average age, proportion of subjects in each gender and proportion of subjects with related comorbidities. In research involving comparisons, an emphasis is placed on the significance level for the hypothesis that the two groups differ to a degree that is greater than would be expected by chance. This significance is represented as a p-value or as a standard score of a test statistic. In contrast, an effect size is a descriptive statistic that conveys the estimated magnitude and direction of the difference between groups. This is without regard to whether the difference is statistically significant. Reporting significance levels without effect sizes is criticized because large samples with small effects can be highly significant statistically.

Inferential statistics

With inferential statistics, conclusions are made that go beyond the immediate data. For instance, inferential statistics are used to infer from sample data what a population might be thinking about something. Or, inferential statistics are used to make judgments of the probability that an observed difference between groups is dependent on something happening, or might have happened by chance in this study. Inferential statistics are used to make inferences from the data collected and used to describe what is probably happening. Most inferential statistics derive from a family of statistical models known as the General Linear Model, and includes the t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), regression analysis, and multivariate methods like factor analysis, multidimensional scaling, cluster analysis, etc. It's a good idea for researchers to become familiar with GLM. Its discussion here is simplified and considers simple straight-line models only.

Tools

References

Personal tools