I am developing a spatiotemporal tree-based ensemble framework (utilizing LightGBM, XGBoost, and CatBoost) to forecast dengue outbreaks based on climate variables (temperature, precipitation, humidity) and lagged historical case counts.

While tree-based algorithms are theoretically invariant to monotonic feature scaling, I am implementing scaling primarily because:

  1. I am calculating SHAP (Shapley Additive Explanations) values for post-hoc model interpretability and global feature importance.

  2. I am applying forward aggregation across temporal slices to prevent data leakage, meaning the range and variance of features dynamically shift across training validation windows.

I am debating between StandardScaler (Z-score normalization) and MinMaxScaler (0-1 normalization). Given the spatiotemporal and epidemiological nature of the data, StandardScaler appears to behave more robustly, but I want to ensure my architectural justification is sound.

Here is a minimal visualization of how the choice impacts extreme climate outliers (e.g., a massive monsoon rainfall anomaly):

import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Simulating a climate feature with a severe anomaly (monsoon spike)
np.random.seed(42)
weekly_rainfall = np.random.normal(loc=150, scale=30, size=100)
weekly_rainfall = np.append(weekly_rainfall, [650])  # Extreme outlier event

df = pd.DataFrame({"Rainfall": weekly_rainfall})

# Applying both scalers
df["MinMax"] = MinMaxScaler().fit_transform(df[["Rainfall"]])
df["Standard"] = StandardScaler().fit_transform(df[["Rainfall"]])

print("Variance of normal weeks under MinMax:", df["MinMax"].iloc[:-1].var())
print("Variance of normal weeks under Standard:", df["Standard"].iloc[:-1].var())

My Observations & Core Dilemma:

  • The Outlier Compression Issue: Epidemic forecasting relies heavily on anomaly detection (e.g., a sudden spike in humidity/rainfall triggers a vector breeding surge). When using MinMaxScaler, the single extreme outlier (650 mm) compresses the variance of the entire "normal" historical distribution into a tiny, narrow band close to 0.

  • SHAP Interpretability Impact: Because MinMaxScaler alters the relative spacing and variance of non-outlier data points under compression, I have noticed it subtly distorts the baseline comparison in SHAP expectation values, making normal variations look uniform to the explainer.

  • StandardScaler Robustness: StandardScaler preserves the variance structure of the normal data because it centers around the mean and scales by standard deviation, allowing the anomaly to exist naturally out at $+5\sigma$ or $+6\sigma$ without destroying the internal resolution of lower-valued inputs.

Questions:

  1. Is my mathematical rationale sound that StandardScaler is objectively better than MinMaxScaler for tree-based ensemble interpretability (SHAP) when dealing with heavy-tailed epidemiological and climate anomalies?

  2. Does the variance compression caused by MinMaxScaler negatively impact the gradient splitting efficiency in LightGBM/XGBoost when handling spatiotemporal forward-aggregated data slices?