To browse other articles on a range of HSL topics, see the A-Z index.
Biostatistics is a field that applies a subset of standard statistical techniques to clinical research in medicine, public health and epidemiology. Biostatistics is also oriented towards formulating questions through observable clinical problems. In order to find solutions to clinical problems, quantitative data is gathered and examined using a combination of mathematics and careful reasoning. This includes the measurement and analysis of data (descriptive, inferential), statistical graphs; clinical decision-making in the face of uncertainty (variability) making inferences from sample populations (a cohort) to populations generally. According to Murad & Shi (2010), biostatistics "...is the application of statistical methods to medical and biological phenomena". Similarly, Jekel et al (2007) say that "...biostatistics is a tool that is used to analyze, understand, and explain the variance in medical and epidemiological data.
The field of biostatistics encompasses the methodology and theory of statistics as applied to the life and biomedical sciences. Biostatisticians are specialists in the evaluation of data as scientific evidence. They understand the construct of data and provide the mathematical framework to generalize the clinical findings. Their expertise includes the design and conduct of experiments, the mode and manner in which data are collected, the analysis of data, and the interpretation of results.
Biostatistics, epidemiology, and evidence-based medicine are closely related disciplines. Epidemiology is defined as the study of disease within populations; epidemiology provides robust quantitative evidence (data) for the practice of evidence-based health care. Biostatistics is simply the set of tools that are used to analyze and understand this data.
Population, sampling, analysis
In biomedical studies, research questions will describe or define the specific population that is being studied. The population being studied is called the target population. The target population should be a well-defined population in order to collect representative data that can be used to gather data pertaining to the research question. Finding the actual answer to a research question requires that the entire target population be observed, which is usually impossible. Further, since it is generally impractical to observe an entire population, biomedical researchers will examine a subset of the population. A subset of the population is a sample, and may provide data but may not definitively answer the research question. Complete information on a target population is required to answer a clinical question; since a sample is a subset of a population, it provides generalizable information about the problem. For this reason, statistics is often referred to as "the science of describing populations in the presence of uncertainty."
Key associations & video
An example of the use of descriptive statistics occurs in drug studies. In papers reporting human subjects, there is typically a table that states the sample size in subgroups (e.g. treatment or exposure groups), and demographic or clinical characteristics such as average age, proportion of subjects in each gender and proportion of subjects with related comorbidities. In research involving comparisons, an emphasis is placed on the significance level for the hypothesis that the two groups differ to a degree that is greater than would be expected by chance. This significance is represented as a p-value or as a standard score of a test statistic. In contrast, an effect size is a descriptive statistic that conveys the estimated magnitude and direction of the difference between groups. This is without regard to whether the difference is statistically significant. Reporting significance levels without effect sizes is criticized because large samples with small effects can be highly significant statistically.
With inferential statistics, conclusions are made that go beyond the immediate data. For instance, inferential statistics are used to infer from sample data what a population might be thinking about something. Or, inferential statistics are used to make judgments of the probability that an observed difference between groups is dependent on something happening, or might have happened by chance in this study. Inferential statistics are used to make inferences from the data collected and used to describe what is probably happening. Most inferential statistics derive from a family of statistical models known as the General Linear Model, and includes the t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), regression analysis, and multivariate methods like factor analysis, multidimensional scaling, cluster analysis, etc. It's a good idea for researchers to become familiar with GLM. Its discussion here is simplified and considers simple straight-line models only.
Key tools & websites