Bayesian Regression for Spatial–Temporal Rainfall Prediction and Analysis - Spring 2026

This is a Report Template Quarto

Author

Mahmudul Islam Prakash (Advisor: Dr. Cohen)

Published

March 23, 2026

Introduction

Rain plays a critical role on farming, water supplies, and mitigates natural hazards, predicting it accurately matters greatly - though doing so stays difficult because weather systems behave in unpredictable, tangled ways. Although older methods like multiple linear regression rely on factors such as heat, air pressure, airflow strength, and moisture levels (Ahmed and Mohamed 2021), they struggle once patterns turn intricate or messy. Even if those techniques stay transparent and straightforward, their usefulness drops whenever real-world connections twist sharply beyond straight-line assumptions. Despite familiarity, such tools falter under patchy records or when subtle shifts hide within broader environmental rhythms.

Addressing such limits has led researchers toward using machine learning along with Bayesian strategies for forecasting rain and similar water cycle challenges. While comparing different models paired with ways to reduce variables, one analysis found accuracy heavily depends on both chosen features and algorithms - Bayesian linear regression showed lower errors and better fits than neural nets, decision forests, or ordinary linear approaches (Kumari, Raza, and Kumari 2023). Notably, integrating past knowledge and managing uncertainties appears critical when building rainfall models through regression.

Instead of relying solely on parametric regression, some researchers turn to non-parametric Bayesian tools like Gaussian Process Regression (GPR) when studying rainfall’s complex nature. Rainfall, treated as a random process under GPR, gains richer representation through both estimated values and measures of prediction confidence. Because it accounts for variability so thoroughly, this method fits well within hydrology and environmental analysis contexts ’(Vidya, Hari, et al. 2021). In similar earth science problems - say, rebuilding movement records caused by rain-triggered slope failures - GPR has proven effective at retaining intricate timing structures while outperforming standard filters with smaller errors (Li et al. 2021).

One key issue in studying rain patterns involves inaccuracies within observed measurements, especially those collected by radar networks. To tackle this, researchers used hierarchical Bayesian methods that adjusted flawed radar readings - leading to fewer mistakes when compared against actual field sensors during major storms like Hurricane Irma in Florida (Ma and Chandrasekar 2020). Such outcomes highlight how these statistical models support clearer understanding of doubt in data while combining information effectively across sources.

Besides fixed outcomes, chance-based rain forecasts now matter more for decisions where danger levels shift - like handling floods. Because past downpour patterns mix well with computer-generated weather models, updated statistical techniques sharpen predictions when storms threaten. These refined tools adjust forecast odds using old data, making warnings during rainy periods less guesswork. Results show clearer probabilities emerge just when communities need them most (Zeng et al. 2024).

Despite common use elsewhere, Bayesian approaches appear frequently within hydrology and climate studies too. One deep look at how these methods forecast rain shows they manage unclear data well, bringing earlier findings into current work, while shaping models for heavy downpours and water flow after rain (Chen, Sun, and Yang 2022). In much the same way, decisions around probability tools matter greatly when building rain-based indicators like SPI - researchers fit curves, weigh model rules, then pick best timings for tracking farm-level droughts (Khandelwal, Goyal, and Shekhawat 2023).

Beyond traditional models, Bayesian Networks offer a way to map how different weather factors influence one another through probability. These networks, when paired with tools like Support Vector Machines or linear regression, show stronger results in forecasting under uncertain conditions (Babu et al. 2025). Instead of ignoring uncertainty, recent advances embrace it - highlighting that predictions gain value when doubt is part of the design. Such efforts underline a shift: handling unknowns matters just as much as the forecast itself in climate science.

Spurred by earlier work, this study centers on Bayesian regression methods for estimating rainfall - specifically using Bayesian linear regression as a transparent, reliable starting point (Kumari, Raza, and Kumari 2023), while also exploring Gaussian Process Regression due to its adaptability without rigid structural assumptions (Vidya, Hari, et al. 2021; Li et al. 2021). Instead of focusing solely on prediction quality, the effort prioritizes models that express uncertainty clearly, since understanding variability supports better choices in water management and climate planning.

Regression and Machine Learning Approaches for Rainfall Prediction

Early methods for forecasting rain relied heavily on basic regression strategies. For instance, multiple linear regression helped link weather factors - temperature, pressure, dew point, wind speed - to rainfall rates, revealing modest but clear patterns in how these elements relate (Ahmed and Mohamed 2021). Despite this, such models struggle due to rigid assumptions built into their design. Their linearity prevents them from reflecting the tangled ways atmospheric components interact in real conditions.

Recent research has looked into machine learning paired with techniques for choosing key variables and reducing data complexity. Instead of combining every available method, Kumari et al. (2023) tested various regression approaches - Bayesian linear, neural networks, decision forests, along with basic linear models - while applying strategies like principal component analysis, correlation screening, and stepwise selection (Kumari, Raza, and Kumari 2023). When predictors were filtered through correlation-driven processes, Bayesian linear regression stood out, delivering lower errors and higher explained variance across RMSE, MAE, and R² metrics. Because it integrates known patterns and accounts for variability within data, this model offers not just accurate forecasts but also more stable outcomes under different conditions. Evidence from their work suggests Bayesian frameworks benefit when guided by informed feature choices, making them resilient without sacrificing precision.

Non-Parametric Bayesian Modeling with Gaussian Processes

Weather patterns show high variability across locations and time, making consistent prediction through standard mathematical formulas challenging. Because of these fluctuations, researchers often turn to flexible modeling techniques that adapt without strict assumptions. Taking this direction, Vidya et al. (2021) explored Gaussian Process Regression for estimating rain levels (Vidya, Hari, et al. 2021). Their method treats precipitation records as outcomes drawn from a probabilistic function with smooth dependencies. Instead of relying on rigid structures, the system identifies trends directly from historical measurements. One standout feature lies in its ability to express confidence ranges alongside predictions. Most conventional models deliver single-valued outputs, offering no measure of reliability. By contrast, this approach quantifies doubt, reflecting how uncertain estimates can be under variable conditions.

GPR’s value in studying rain-linked phenomena emerges clearly through recent geophysical work. Instead of standard methods, Li et al. (2021) turned to GPR when rebuilding InSAR data on landslide movements caused by rainfall - revealing stronger accuracy where ground shifts were highly nonlinear (Li et al. 2021). Because past precipitation patterns informed the model’s assumptions, timing details in surface changes became more precise, cutting down mistakes in output. Where deformation behaves unpredictably, such an approach proves especially effective. Evidence like this strengthens the case for using GPR in complex, moisture-sensitive earth systems. Though indirect, its integration with environmental memory offers a sharper lens on slow ground responses.

Hierarchical Bayesian Models for Rainfall Estimation and Bias Correction

Although forecasting is crucial, accurate rainfall estimation is challenged by biases and uncertainties in observational data sources, especially weather radar products. Using Hurricane Irma’s impact in Florida as a test example, Ma and Chandrasekar (2021) introduced a layered Bayesian method to fix distortions in NEXRAD Dual-Polarization rainfall readings (Ma and Chandrasekar 2020). Through Gaussian processes the model clearly illustrates the relation between true rainfall and radar observations while taking spatially correlated errors into consideration.

Improvements exceeded 30% at validation sites when using Bayesian-corrected rainfall data instead of raw radar output - RMSE and adjusted average error both dropped significantly (Ma and Chandrasekar 2020). In near ground reference stations, prediction quality rose further if models used spatial Gaussian processes. Hierarchical Bayesian methods proved useful in this study, especially for data fusion, uncertainty quantification, and improving the reliability of rainfall estimates during extreme events.

Bayesian Probabilistic Forecasting and Ensemble Post-Processing

Flood risk management and decision making rely more on probabilistic rain forecasts, especially when facing uncertain conditions. Though ensemble models offer several possible outcomes, these raw probabilities tend to be skewed or misaligned. When tested during rainy seasons, a method rooted in Bayes’ theorem reshaped those initial guesses by learning from past rain patterns. That adjustment - using old data to refine new predictions - lifted accuracy sharply, shown clearly through better Brier Scores and stronger True Skill Statistics (Zeng et al. 2024).

Despite its simplicity, this method improved accuracy in near-term weather outlooks while cutting down on incorrect warnings for intense rain. Results suggest that using Bayesian techniques allows forecasters to blend historical climate patterns with current model outputs - producing estimates that better support real-world choices. Though subtle, the shift offers practical gains where precision matters most.

Bayesian Methods in Drought Analysis and Weather Prediction

Starting from broader uses, Bayesian approaches appear across many water and weather studies. In their work, Khandelwal and team (2023) explored how different time frames affect the Standardized Precipitation Index when tracking farm-related droughts (Khandelwal, Goyal, and Shekhawat 2023). Instead of relying on a single method, they merged distribution fitting with AIC and BIC rules plus regression techniques. It turned out the three-month version lined up most closely with rain patterns while showing superior model performance. This outcome underlined how decisions about probability models shape outcomes in indices built from rainfall data.

A wide look at how Bayesian techniques apply to rain-focused research shows uses like classification models, network-based systems, alongside runoff simulations and heavy rain evaluation (Chen, Sun, and Yang 2022). These efforts repeatedly show that such approaches help blend existing understanding with new measurements - while carrying uncertainties forward into forecasts.

Besides traditional methods, some researchers suggest using Bayesian Networks to capture how different weather factors influence one another through probability. Instead of relying on single models, Babu and colleagues in 2025 merged these networks with Support Vector Machines along with linear regression techniques - this mix handled messy or partial datasets better than isolated systems (Babu et al. 2025). When data gets spotty, such flexible setups tend to hold up well. Because of this, working with uncertainty becomes less of a flaw and more of a feature in forecasting atmospheres.

Proposed Approach

Research trends have shifted noticeably since earlier work using basic linear methods (Ahmed and Mohamed 2021), moving instead toward adaptive techniques that handle uncertainty better - such as Bayesian and machine learning models (Kumari, Raza, and Kumari 2023; Vidya, Hari, et al. 2021; Ma and Chandrasekar 2020). Although simpler forms like Bayesian linear regression remain useful due to their clarity, they share space now with more advanced options. One such option, Gaussian Process Regression, stands out by fitting complex patterns without fixed assumptions, also giving estimates of confidence (Vidya, Hari, et al. 2021; Li et al. 2021). What these newer systems show - even beyond individual accuracy - is how critical it is to account for doubt in predictions, especially when estimating rain (Ma and Chandrasekar 2020; Zeng et al. 2024).

This work builds from those findings by applying Bayesian regression to model rainfall, prioritizing prediction accuracy along with measures of uncertainty - aligning with accepted methods in hydrology and climate research today.

Methodology

This study investigates spatial variability in annual precipitation across weather stations in Florida using a Bayesian regression framework. The response variable is annual precipitation (PRCP), measured in millimeters. Predictor variables include latitude, longitude, and elevation, which represent geographic and topographic factors that influence rainfall variability. Prior to modeling, the dataset is filtered to retain stations within Florida and to remove observations with missing or invalid geographic coordinates or precipitation values. Exploratory data analysis is performed to examine the distribution of precipitation values and visualize spatial rainfall patterns.

Bayesian Linear Regression

The primary analytical framework used in this study is Bayesian linear regression. Let \(PRCP_i\) denote the annual precipitation at station \(i\). The likelihood function assumes precipitation values follow a normal distribution:

\[ PRCP_i \sim \mathcal{N}(\mu_i, \sigma^2) \]

where the conditional mean is defined as

\[ \mu_i = \beta_0 + \beta_1 LATITUDE_i + \beta_2 LONGITUDE_i + \beta_3 ELEVATION_i \]

Here, \(\beta_0\) represents the intercept, \(\beta_1, \beta_2, \beta_3\) represent regression coefficients associated with the geographic predictors, and \(\sigma^2\) represents the residual variance.

In the Bayesian framework, model parameters are treated as random variables rather than fixed unknown values. Weakly informative prior distributions are assigned to the regression coefficients:

\[ \beta_j \sim \mathcal{N}(0,\tau^2) \]

and the residual standard deviation is assigned the prior

\[ \sigma \sim \text{Half-Cauchy}(0,5) \]

The posterior distribution of the parameters is obtained using Bayes’ theorem:

\[ p(\beta,\sigma \mid PRCP) \propto p(PRCP \mid \beta,\sigma)\,p(\beta)\,p(\sigma) \]

Posterior inference is performed using Markov Chain Monte Carlo (MCMC) sampling. Bayesian regression has been shown to perform effectively in rainfall prediction tasks while also providing interpretable probabilistic estimates (Kumari, Raza, and Kumari 2023). Additionally, Bayesian approaches are particularly useful in meteorological applications because they explicitly account for uncertainty in parameter estimation and prediction (Ma and Chandrasekar 2020).

Uncertainty Quantification and Predictive Inference

A key advantage of Bayesian regression is the ability to quantify uncertainty in model parameters and predictions. Instead of producing single-point estimates, the Bayesian model yields posterior distributions for each coefficient. Credible intervals provide probabilistic bounds on parameter estimates, allowing interpretation of the strength and direction of relationships between precipitation and geographic predictors.

Posterior predictive checks are conducted to evaluate whether simulated rainfall values generated from the fitted model resemble the observed data. Similar probabilistic modeling approaches have been used in rainfall studies that incorporate spatial variability and uncertainty using Bayesian methods and Gaussian Process frameworks (Vidya, Hari, et al. 2021).

Baseline Comparison

Although the Bayesian regression model serves as the primary analytical framework, a classical multiple linear regression model with the same predictor structure is also estimated as a baseline comparison:

\[ PRCP = \beta_0 + \beta_1 LATITUDE + \beta_2 LONGITUDE + \beta_3 ELEVATION + \epsilon \]

This baseline allows comparison of predictive accuracy and interpretability between classical and Bayesian approaches. While classical regression produces point estimates of coefficients, it does not naturally incorporate uncertainty in the same probabilistic manner as the Bayesian framework.

Data Description and Visualization

The dataset used in this project is actually the Florida weather data of the NOAA Global Surface Summary of the Year (GSOY) 2024 station data. Embedded within are station identifiers, geographic coordinates (latitude and longitude), timestamps, rainfall amounts recorded yearly in millimeters (PRCP). Because patterns shift across regions, measurements feed into models forecasting rainfall through Bayesian regression alongside comparative methods and then spatial analysis takes form from these feeds.

Each row represents precipitation details of a single weather station in Florida within the dataset. The precipitation variable serves as the main response for statistical modeling, while for geospatial visualization the metadata from the station provides the spatial context needed for it. Before starting the analysis, missing entries got checked and basic cleaning steps were employed. Subsequently, exploratory summaries and visualizations were produced to comprehend the overall distribution and spatial variability of rainfall throughout the study area.

Dataset Structure and Key Variables

The NOAA Global Summary of the Year (GSOY) dataset provides annual climate summaries for weather stations. Each record represents one station for a given year. The full GSOY dataset contains dozens of climatological variables, but the key fields used in this project include:

STATION — Unique station identifier code.
NAME — Name of the station (e.g., city or airport).
LATITUDE — Geographic latitude of the station (decimal degrees).
LONGITUDE — Geographic longitude of the station (decimal degrees).
ELEVATION — Elevation of the station above sea level (meters).
YEAR — Year of the observation (e.g., 2024).
PRCP (Precipitation) — Total annual precipitation in millimeters (mm).
Other available climatological variables exist (e.g., temperature averages, wind speed, degree days), but are not used in the current analysis.

This study primarily emphasizes station location and precipitation values, as they are crucial for spatial-temporal rainfall assessment and for calibrating regression models (Bayesian and comparison models). The station coordinates allow mapping and spatial visualization of rainfall across Florida.

Column Usage in Project

Field	Description	Usage in Project
STATION	Unique station identifier	Identifies individual weather stations
NAME	Station name	Used as metadata reference (optional)
LATITUDE	Geographic latitude (decimal degrees)	Used for spatial mapping
LONGITUDE	Geographic longitude (decimal degrees)	Used for spatial mapping
ELEVATION	Elevation above sea level (meters)	Potential explanatory variable (optional)
YEAR	Extracted year from DATE (2024)	Used for temporal reference
PRCP	Annual precipitation (millimeters)	Main response variable for analysis

Visualization of Rainfall Distribution

Code

library(ggplot2)
library(sf)
library(dplyr)
library(tigris)

# Load data
gsoy <- read.csv("/Users/miprakash/UWF/4 - Capsone Project/Project/IDC6940_Bayesian_Regression/code/4163416.csv")

gsoy$YEAR <- as.integer(substr(gsoy$DATE, 1, 4))
# Convert station coordinates into spatial data
stations_sf <- st_as_sf(gsoy, coords = c("LONGITUDE", "LATITUDE"), crs = 4326)

# -------------------------
# 1) Distribution of annual precipitation across stations (2024)
# -------------------------
ggplot(gsoy, aes(x = PRCP)) +
  geom_histogram(aes(y = after_stat(density)),
                 bins = 25,
                 color = "white",
                 fill = "steelblue",
                 alpha = 0.9) +
  geom_density(linewidth = 1, color = "black") +
  labs(
    title = "Distribution of Annual Precipitation (2024)",
    x = "Precipitation (mm)",
    y = "Density"
  ) +
  theme_minimal()

Code

# ------------------------------------
# 2) Spatial visualization (Florida map)
# ------------------------------------
options(tigris_use_cache = TRUE)

florida <- states(cb = TRUE) |>
  filter(STUSPS == "FL") |>
  st_transform(4326)

ggplot() +
  geom_sf(data = florida, fill = "white", color = "black") +
  geom_sf(data = stations_sf, aes(color = PRCP), size = 1.5, alpha = 0.85) +
  scale_color_viridis_c(option = "roma", name = "Rainfall (mm)") +
  coord_sf(
    xlim = st_bbox(florida)[c("xmin","xmax")],
    ylim = st_bbox(florida)[c("ymin","ymax")]
  ) +
  labs(
    title = "Spatial Distribution of Average Annual Rainfall (Florida)",
    x = "Longitude",
    y = "Latitude"
  ) +
  theme_minimal()

Initial Observations

The distribution map shows that while fewer stations show extremely low or extremely high rainfall levels, the majority of stations report precipitation values within a moderate range. This suggests that the state’s annual precipitation varies noticeably. The geographical map demonstrates that Florida’s rainfall distribution is uneven, with some areas experiencing more precipitation than others. The use of statistical modeling, such as Bayesian regression, in the subsequent research is motivated by these preliminary patterns, which indicate that geographic location plays a significant role in explaining rainfall variability.

References

Ahmed, Hiyam Abobaker Yousif, and Sondos WA Mohamed. 2021. “Rainfall Prediction Using Multiple Linear Regressions Model.” In 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), 1–5. IEEE.

Babu, Tina, J Mano Shankari, Gudepu Rakshitha, Nandhan Kumar, Hemanth Kumar, and Aashim Dhawan. 2025. “Bayesian Network for Weather Prediction.” In 2025 International Conference on Emerging Technologies in Computing and Communication (ETCC), 1–6. IEEE.

Chen, Yiming, Jialiang Sun, and Yundi Yang. 2022. “Related Application Methods and Practices of Bayesian Prediction in the Field of Rainfall.” In 2022 International Conference on Data Analytics, Computing and Artificial Intelligence (ICDACAI), 223–27. IEEE.

Khandelwal, Ritu, Hemlata Goyal, and Rajveer Singh Shekhawat. 2023. “Agricultural Drought Index Selection Using Probability Distribution: Statistical and Linear Regression Approach.” In 2023 3rd International Conference on Innovative Sustainable Computational Technologies (CISCT), 1–6. IEEE.

Kumari, Sapna, Muhammad Owais Raza, and Arsha Kumari. 2023. “Performance Evaluation of Machine Learning Algorithms for Rainfall Prediction Using Dimensionality Reduction Techniques.” In 2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), 1–6. IEEE.

Li, Zhiyong, Yunqi Wang, Jinghan Mu, Weiming Liao, and Kui Zhang. 2021. “InSAR Deformation Time-Series Reconstruction for Rainfall-Induced Landslides Based on Gaussian Process Regression.” In Proceedings of the 2021 13th International Conference on Machine Learning and Computing, 117–26.

Ma, Yingzhao, and V Chandrasekar. 2020. “A Hierarchical Bayesian Approach for Bias Correction of NEXRAD Dual-Polarization Rainfall Estimates: Case Study on Hurricane Irma in Florida.” IEEE Geoscience and Remote Sensing Letters 18 (4): 568–72.

Vidya, GS, VS Hari, et al. 2021. “Rainfall Forecasting Using Non-Parametric Bayesian Approach.” In 2021 Fourth International Conference on Microelectronics, Signals & Systems (ICMSS), 1–4. IEEE.

Zeng, Peng, Wei Zhang, Yangjun Zhou, and Guohui Wei. 2024. “Bayesian Probability Intelligent Forecasting Model Based on Rainfall During Flood Season.” In 2024 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS), 323–28. IEEE.