Bayesian Methods · Original Research

A Bayesian Hierarchical Model for Estimating Hospital-Level Mortality Rates with Informative Priors

Takeshi Nakamura^1*, Kwame Osei-Bonsu², Linnea Eriksson³, Priya Gupta⁴

¹ Department of Biostatistics, University of Tokyo, Japan; ² School of Public Health, University of Ghana, Accra; ³ Division of Clinical Epidemiology, Karolinska Institutet, Stockholm; ⁴ Centre for Health Informatics, AIIMS New Delhi, India

doi: 10.41093/bam.2026.12.04.201 · Code: github.com/tnakamura/hospital-bayes

BAYESIAN MCMC DAG MODEL

Abstract

Background: Comparing hospital mortality rates requires adjusting for case-mix differences. Frequentist standardized mortality ratios (SMRs) are unstable for low-volume hospitals and do not naturally incorporate external evidence. Bayesian hierarchical models address both limitations through partial pooling and informative prior specification.
Objectives: To develop a Bayesian hierarchical logistic regression model for 30-day post-surgical mortality, using informative priors derived from a national registry (N = 2.4 million admissions), and to compare its calibration and discrimination against frequentist fixed-effects and random-effects alternatives.
Methods: We fit a three-level hierarchical model (patients nested in surgeons nested in hospitals) to 186,422 surgical admissions across 47 hospitals. Priors for regression coefficients were derived from the Japanese national DPC registry. Inference was performed via Hamiltonian Monte Carlo (HMC) using Stan, with 4 chains of 4,000 iterations each. Convergence was assessed via R-hat, effective sample size, and trace plot inspection.
Results: The Bayesian hierarchical model achieved superior calibration (Brier score 0.031 vs. 0.038 frequentist) and discrimination (C-statistic 0.847 vs. 0.832). Hospital-level shrinkage was most pronounced for low-volume hospitals (median shrinkage 42% for hospitals with fewer than 200 annual cases vs. 8% for hospitals exceeding 2,000). All parameters achieved R-hat below 1.01 with effective sample sizes exceeding 3,000.
Conclusions: Bayesian hierarchical models with registry-derived informative priors produce more stable and better-calibrated hospital mortality estimates than frequentist alternatives, particularly for low-volume hospitals where data are sparse. The informative prior acts as a principled regularizer, borrowing strength from the national population.
Keywords: Bayesian hierarchical model, hospital mortality, informative prior, MCMC, Stan, partial pooling, shrinkage estimation, case-mix adjustment

Prior and Posterior Distributions

Figure 1. Prior and posterior density curves for the hospital-level random effect standard deviation (sigma_h). The prior (Half-Cauchy) is diffuse, reflecting uncertainty before seeing data. The posterior is concentrated around 0.34, indicating moderate between-hospital variation after accounting for case mix. The data have substantially updated the prior.

Model Structure (DAG)

Figure 2. Directed acyclic graph (DAG) of the three-level Bayesian hierarchical model. Hyperparameters mu and sigma_h (gold) govern the distribution of hospital-level random effects alpha_h (blue). Patient outcomes y_ij (green) are conditionally independent given their hospital effect and patient-level covariates X_ij (red). Arrows indicate conditional dependencies.

Key Results

Table 1. Model comparison: Bayesian hierarchical vs. frequentist alternatives
Metric	Bayesian Hierarchical	Frequentist Random Effects	Frequentist Fixed Effects
C-statistic	0.847	0.839	0.832
Brier score	0.031	0.034	0.038
ELPD (LOO-CV)	-18,442	--	--
Hospital estimates unstable (CV > 0.5)	2 / 47	8 / 47	14 / 47
Max R-hat	1.003	--	--
Min ESS (bulk)	3,412	--	--
ELPD = expected log pointwise predictive density; LOO-CV = leave-one-out cross-validation; ESS = effective sample size. Bold cells indicate the best-performing model for each metric.

Structure of the Paper

Model Specification -- The three-level hierarchical model, prior elicitation, and DAG structure
Prior Elicitation -- Derivation of informative priors from the Japanese DPC national registry
MCMC Diagnostics -- Trace plots, R-hat convergence, effective sample size assessment
Results -- Posterior estimates, shrinkage analysis, and model comparison
Discussion -- Interpretation, limitations, and recommendations for hospital benchmarking