Fractional differentiation and its use in machine learning

This paper addresses a fundamental challenge in financial machine learning: the "stationarity versus memory dilemma." Most supervised ML algorithms require stationary time series (where mean, variance, and covariance are constant over time), but standard approaches to achieve stationarity—particularly integer-order differencing—destroy the temporal memory and autocorrelation that enables predictive power. The authors propose and validate fractional (non-integer order) differentiation as a solution, applying it to stock price prediction using artificial neural networks.

The mathematical framework extends classical differencing through binomial expansion, allowing the differencing parameter d to take non-integer values (e.g., d=0.3 instead of just d=0 or d=1). This creates smooth weight decay across past observations rather than sharp truncation, preserving more historical context while still achieving stationarity. The authors apply this technique to four major international stock indexes (WIG 20, S&P 500, DAX, and Nikkei 225) spanning June 2010 to June 2020, finding optimal d values between 0.12 and 0.43 that achieve statistical stationarity (confirmed via Augmented Dickey-Fuller testing) while maintaining correlations above 0.99 with the original series—demonstrating exceptional memory preservation.

The practical validation uses feedforward neural networks to predict next-day closing prices using previous day's high, low, open, and close prices. When data is preprocessed using fractional differentiation rather than standard first-order differencing, the resulting neural networks show consistent improvements in both Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) metrics across all four indexes tested. This empirical evidence demonstrates that fractional differentiation offers a middle ground: stationary features suitable for ML algorithms while retaining predictive information that would be lost through conventional differencing.

The work builds directly on Marcos López de Prado's "Advances in Financial Machine Learning" framework, which popularized this technique in financial circles, and traces its mathematical origins to Hosking's 1981 paper on fractional differencing in Biometrika. By providing concrete empirical validation on real financial data with practical neural network implementations, the paper bridges theoretical statistics and applied machine learning, offering practitioners a more nuanced preprocessing approach for financial time series modeling.

Key Takeaways

About

Sentiment / Tone

Methodical and evidence-driven with a practical optimization focus. The authors adopt a neutral, technical tone characteristic of applied research papers, presenting the technique as an elegant mathematical solution to a well-known problem. They avoid overselling claims, clearly stating that the goal is feature preprocessing (not overall model performance evaluation), and systematically validate their approach through statistical tests (ADF, KPSS, Pearson correlation) and empirical comparison. The work conveys cautious confidence: the fractional differentiation approach is presented as a refinement solving a recognized dilemma rather than a revolutionary breakthrough, but the consistent improvements across multiple datasets and error metrics suggest genuine practical value. The authors demonstrate academic rigor by testing on international markets and acknowledging the limitations of their specific neural network architecture.

Related Links

Research Notes

**Author Background & Credibility:** Rafał Walasek and Janusz Gajda are researchers at the Faculty of Economic Sciences, University of Warsaw, with expertise in statistics and econometrics. This paper represents a formalization of their 2020 working paper (WP 2020-32), suggesting peer review and iterative refinement. Janusz Gajda has an established research presence in time series econometrics and financial analysis. While not affiliated with a top-tier global finance institution, their institutional credibility in statistical methodology and econometric application is solid. **Broader Context & Reception:** This work arrived at an optimal moment in the financial ML community. López de Prado's 2018 "Advances in Financial Machine Learning" had generated significant interest, but rigorous empirical validation was limited. Walasek and Gajda provided exactly what was needed—statistical validation using major international stock indexes. The paper has accrued citations and spawned multiple independent implementations (notably the fracdiff GitHub library), indicating genuine impact. The technique is now established in quantitative finance, referenced in trading systems and algorithmic strategy development. **Strengths of the Work:** Methodological rigor stands out: multiple stationarity tests (ADF and KPSS), validation across four independent international markets, proper train-test splits, and measurement of realistic ML outcomes. The mathematical exposition is lucid with explicit weight formulas. Measuring correlation to the original series (>0.99) is particularly insightful—it quantifies memory preservation, not merely stationarity achievement. The choice of four geographically and economically diverse stock indices strengthens generalizability claims. **Limitations & Caveats:** (1) The paper focuses exclusively on feature preprocessing quality, not end-to-end trading strategy performance—improved RMSE doesn't guarantee profitability. (2) Validation limited to feedforward neural networks; benefits to other algorithms (SVMs, gradient boosting, tree ensembles) remain unexplored. (3) Testing covers 2010-2020, largely a bull market; behavior during crises or extreme volatility regimes is untested. (4) No principled guidance for selecting d—practitioners must perform hyperparameter search. Recent extensions (2024-2025) address this with Adaptive Fractional Differencing using Hurst exponent analysis and cross-validation. **How It Fits Broader Conversation:** The paper occupies the intersection of classical econometrics (stationarity, ARFIMA), modern ML (feature engineering), and quantitative finance (stock prediction). It represents financial ML maturation—moving beyond "apply deep learning to prices" toward principled feature engineering grounded in statistical and financial theory. The rapid adoption and implementation proliferation demonstrates it solved a genuine industry pain point. **Notable Follow-up Work:** The open-source community embraced the technique rapidly. The fracdiff library achieved significant adoption. Recent work (2024-2025) proposes Adaptive Fractional Differencing methods automatically selecting d via Hurst exponent and cross-validation, and extends the approach to hybrid two-stage procedures. Modern implementations address the paper's hyperparameter selection limitation. The technique now appears standard in quantitative finance ML systems. **Potential Biases & Gaps:** Stock price selection may bias toward liquid, efficient markets where patterns are subtle. Cryptocurrency, commodities, or illiquid equity securities might exhibit different dynamics. Testing limited to closing price prediction; intraday volatility or portfolio optimization applications unexplored. Simple neural architecture tested; whether benefits persist with modern deep architectures (transformers, attention mechanisms) remains unclear. The 2010-2020 period had favorable market conditions; crisis periods warrant separate validation.