Bayesian statistics constitute one of the not-so-conventional subareas within statistics, based on a particular vision of the concept of probabilities. This post introduces and unveils what bayesian statistics is and its differences from frequentist statistics, through a gentle and predominantly non-technical narrative that will awaken your curiosity about this fascinating topic.
Introduction
Statistics constitutes an invaluable set of methods and tools for analyzing and making decisions based on data. Their application in various fields has been present for decades or even centuries since the birth of statistics as a discipline in the 18th century.
Traditionally, statistics have been studied and applied by adopting a frequentist approach, based on the idea that the probability of an event is described by the frequency at which that event happens after a large number of experiments or trials.
However, there is a less-known yet equally powerful approach to statistics: the Bayesian approach. Let’s uncover what this approach is about.
Bayesian Statistics Demystified
Bayesian statistics allow the incorporation of prior information — often of subjective nature — in statistical analysis. This can lead to conclusions or decisions that are better adjusted to reality in certain cases.
This characteristic of Bayesian statistics distinguishes it from frequentist statistics in a central idea to both: the interpretation of probability. Unlike frequentist statistics, where probability is understood as the long-term frequency of an event, requiring a prior number of experiments and observations, in Bayesian statistics probability is understood as a degree of belief or certainty. This measure can be updated if new evidence or information about the phenomenon being investigated becomes available. This way, Bayesian methods can incorporate prior knowledge or assumptions, while frequentist statistics exclusively focuses on the data collected throughout the experiment being studied.
The following example illustrates the fundamental differences between frequentist and Bayesian statistics:
Suppose a healthcare doctor wants to calculate the probability P(H|E) that a patient suffers from a rare disease after obtaining a positive result in a diagnosis test in a trial phase. Here, P(H|E) is the posterior probability, where H is the event of the patient having the disease, and E is the evidence, such as the positive result from the diagnostic test.
- From a frequentist perspective, the doctor would consider the rate of false positives (patients diagnosed as positive and not having the disease) and the prevalence rate of the disease in a larger population to calculate the probability P(H|E). No previous patient history or information would be used in this calculation of the probability of having the disease after having tested positive.
- Meanwhile, a Bayesian perspective would allow the doctor to include prior information about the patient in the calculation of the probability, like previous and present symptoms and their medical history. If symptoms presented are related to the disease, the doctor could adjust the initial probability that the patient suffers from it, updating it based on the diagnostic test result.
In summary, a Bayesian approach provides a more personalized view of probability, thereby reflecting the real patient situation more faithfully.
Formally, the field of Bayesian statistics is founded on several concepts, methods, and techniques. Four basic pillars, constituting essential notions for those interested in familiarizing with this branch of statistics, are:
- Bayes Theorem: this is the central formula around which formal methods have been proposed for calculating updated and accurate probabilities based on new evidence.
- Prior and posterior probability: the prior probability P(H) is the initial belief about the probability of an event H before incorporating a piece of evidence (E), whereas the probability P(H|E) of that event after observing the evidence is known as posterior probability.
- Bayesian inference: the set of methods and processes whereby Bayes theorem is leveraged to update belief-driven probabilities.
- Monte Carlo Markov Chain (MCMC) sampling: method to approximate posterior probability distributions by randomly drawing samples.
Advantages and Disadvantages of Bayesian Statistics
The following table summarizes some of the pros and cons of bayesian statistics approaches and methods, compared to traditional frequentist methods.
Applications of Bayesian Statistics
We finalize by listing a some application domains where Bayesian statistics have been successfully put into practice.
- Machine Learning and Artificial Intelligence, particularly in probabilistic models and reinforcement learning algorithms that heavily rely on Bayesian statistics techniques
- Financial modeling for risk assessment and forecasting processes
- Healthcare and medical diagnosis, for disease prediction and assessing patient risks
- Environmental sciences, for modeling climate patterns and assessing biodiversity and ecosystem risk
- Marketing and consumer behavior analysis in retail, along with product demand forecasting
Conclusion
This article provided a gentle and non-technical overview of Bayesian statistics, highlighting their key differences with classical statistical approaches and outlining some of its application domains. For those interesting in going deeper, we encourage you to keep exploring the intricacies of this powerful and versatile set of statistical methods based on the notions and key concepts listed above.