๐Ÿ‘€ PairScan

Methodology

Hurst exponent, ADF test, walk-forward backtest with no lookahead โ€” the math behind every PairScan signal.

Last updated:

If you came from r/algotrading or Quantocracy, this page is for you. Below is the exact math behind every signal PairScan emits. No black boxes, no "proprietary AI" โ€” just classical statistics from the 1950s and 70s applied to crypto and tokenized-equity ratios. The reference implementation of every filter is open-source under MIT (see pairscan-rmr โ€” pip install pairscan-rmr).

Why pair trading on ratios

A directional trade requires you to predict price. A ratio trade only requires you to identify when the ratio between two correlated assets has stretched beyond its historical band โ€” a much weaker claim. The correlation does the work; you just sell the asset that's run too far ahead and buy the one that's lagged.

Concrete example: imagine ETH and BTC both rallied 50% over a year, but ETH led the move and finished trading at 0.075 BTC vs. its 18-month average of 0.060. The directional trade is "BTC up to $90 k by Q4". The ratio trade is "ETH/BTC at 0.075 is two standard deviations above its trailing band; reduce ETH and add BTC". You don't need to know which way Bitcoin is going โ€” you only need the ratio to revert toward 0.060. If both legs rally together, the ratio drifts back; if both fall together, same; only a sustained dominance shift breaks the trade.

For this to be profitable in real execution, three things have to be true at once:

  1. The ratio mean-reverts (it doesn't trend).
  2. The boundaries are wide enough to cover round-trip slippage + fees.
  3. The pair has enough liquidity for your size.

Filters 1 and 2 are statistical. Filter 3 is operational. We enforce all three before a pair makes the screen.

The four filters

Each pair goes through four independent tests on a rolling 540-day window of log(price_A / price_B). A pair must clear all four to surface. The thresholds below are deliberately loose โ€” see the loose-thresholds section for why.

Hurst exponent (R/S analysis)

The Hurst exponent measures long-term memory of a series. For a self-similar process, the rescaled range R/S at lag n grows as n^H. We estimate H by computing R/S at multiple lags, plotting logโ€“log, and reading the slope:

  • H < 0.5 โ€” anti-persistent / mean-reverting (we want this)
  • H โ‰ˆ 0.5 โ€” random walk
  • H > 0.5 โ€” persistent / trending

Computed via R/S analysis on the 540-day log-ratio. Reference implementation:

import numpy as np

def hurst_rs(series, max_lag=100):
    """Hurst exponent via R/S analysis."""
    lags = np.unique(np.geomspace(10, max_lag, 12).astype(int))
    rs_values, valid_lags = [], []
    for lag in lags:
        n_chunks = len(series) // lag
        if n_chunks < 2: continue
        chunks = series[:n_chunks * lag].reshape(n_chunks, lag)
        means = chunks.mean(axis=1, keepdims=True)
        cumdev = (chunks - means).cumsum(axis=1)
        ranges = cumdev.max(axis=1) - cumdev.min(axis=1)
        stds = chunks.std(axis=1, ddof=1)
        valid = stds > 0
        rs_values.append(np.mean(ranges[valid] / stds[valid]))
        valid_lags.append(lag)
    slope, _ = np.polyfit(np.log(valid_lags), np.log(rs_values), 1)
    return slope

We require H < 0.5 for inclusion. This is generous โ€” many quant desks require H < 0.45 or H < 0.4 โ€” but loose thresholds are deliberate: we'd rather catch more candidate pairs and let the walk-forward backtest filter further than reject early on a noisy estimator. The R/S statistic has a documented small-lag bias (Lo, 1991), so we restrict the regression to lags โ‰ฅ 10 even when the user passes a smaller max_lag.

The exact same code ships in pairscan-rmr at src/pairscan_rmr/filters.py.

Augmented Dickey-Fuller test

ADF tests whether a series is stationary (rejection of unit root) vs. non-stationary. Applied to log(A/B) over 540 days, using the lag length selected by minimising AIC. We use statsmodels directly:

from statsmodels.tsa.stattools import adfuller

def adf_pvalue(series, autolag="AIC"):
    """Returns just the p-value; lower = more stationary."""
    result = adfuller(series, autolag=autolag)
    return float(result[1])

We require ADF p-value < 0.7 โ€” again, a generous threshold. Strict academic work uses p < 0.05; we use 0.7 because:

  • ADF has low statistical power on shorter samples (and crypto histories are short),
  • We already have Hurst as a parallel filter โ€” both should agree,
  • The walk-forward backtest is the actual decision-maker; ADF and Hurst are just guards against obviously unsuitable pairs.

A pair that scores Hurst 0.45 and ADF 0.6 is a borderline candidate; one that scores Hurst 0.55 fails outright. Both Hurst-failing and ADF-failing pairs do exist in our universe โ€” we exclude on the order of 30โ€“40 % of nominally cointegrated pairs at the filter stage.

Range width and alternating touches

A pair can pass Hurst and ADF and still be useless if the band is too narrow (slippage eats the edge) or if the price has only ever touched the same boundary repeatedly (no actual oscillation observed). Two operational filters catch that:

  • Range width. Compute (P95 โˆ’ P5) of the log-ratio over the 540-day window. We require it to span at least 0.4 log-units, equivalent to a ~50 % swing in the underlying ratio. Below that, a 0.1 % swap fee plus realistic slippage eats more than half the per-cycle gain.
  • Alternating touches. Walk through the series and count low-side touches (value โ‰ค P5) and high-side touches (value โ‰ฅ P95), but only count a touch when the previous touch was on the opposite side. A monotone series that visited P5 once and then drifted up gets one low touch and one high touch โ€” not five of each. We require โ‰ฅ 2 alternating touches per side, which means the pair has actually oscillated four times in the lookback.

The alternating constraint kills "fake mean-reverters" โ€” pairs whose log-ratio looks bounded only because they crashed early in the window and have been recovering ever since.

Volume gate

Both legs must have 24h spot volume > $1 M on at least one of {Binance, Bybit, OKX, KuCoin}. This isn't statistical โ€” it's operational. Below $1 M/day, slippage and spread eat the entire rebalancing premium and the math doesn't work in real execution. The gate is non-negotiable; we'd rather skip a beautifully mean-reverting micro-cap than recommend a pair you can't actually trade in size.

Walk-forward backtest

Once a pair passes all four filters, we backtest 360 days with the same signal logic the live system uses. Critical detail: no lookahead.

For each day t in the backtest window, we compute the rolling P5 and P95 over the previous rolling_window_days (default: 180) days only โ€” data that would have been available to a trader on day t. Today's data is not used in today's signal. This is the difference between an honest backtest and a fitting exercise.

import numpy as np

def walk_forward_backtest(price_a, price_b, lookback=540,
                          entry_low=0.2, entry_high=0.8, fee=0.001):
    log_ratio = np.log(price_a / price_b)
    a_qty, b_qty, holding_a = 100.0, 0.0, True
    trades = []
    for t in range(lookback, len(price_a)):
        # CRITICAL: window is strictly BEFORE t โ€” no future data.
        window = log_ratio[t - lookback:t]
        p_low, p_high = np.percentile(window, [5, 95])
        position = (log_ratio[t - 1] - p_low) / max(p_high - p_low, 1e-9)
        if position <= entry_low and not holding_a:
            a_qty = b_qty * price_b[t] / price_a[t] * (1 - fee)
            b_qty = 0.0; holding_a = True
            trades.append(("Bโ†’A", t))
        elif position >= entry_high and holding_a:
            b_qty = a_qty * price_a[t] / price_b[t] * (1 - fee)
            a_qty = 0.0; holding_a = False
            trades.append(("Aโ†’B", t))
    return a_qty, b_qty, trades

The naive alternative โ€” using np.percentile(log_ratio, [5, 95]) over the full series, including future data โ€” typically overstates the strategy's accumulation by 10โ€“30 %. We catch that class of bug with a regression test that runs the backtest twice: once on clean data, once with all prices after a midpoint replaced with garbage. Trades before the midpoint must be byte-identical between runs. If a future-dependent statistic ever leaks in, the test fails immediately. The probe is in tests/test_no_lookahead.py.

Why we use loose thresholds

A common quant-research instinct is to tighten filters until backtests look clean. We deliberately don't. Loose Hurst (< 0.5), loose ADF (< 0.7), generous range width (40 %) โ€” the result is more candidate pairs surface to the screen, and the walk-forward backtest does the final filtering by showing which actually accumulate quantity over the period.

This trades false-positives for false-negatives. We'd rather show you ten candidate pairs and let you pick the three you trust over showing you one "clean" pair that happened to backtest well by accident. Quant trading on small samples is fundamentally a high-noise environment; tightening filters narrows the funnel but doesn't actually improve the signal-to-noise ratio of what comes through.

The flip side: every filter has an explicit override in the pairscan-rmr API (is_mean_reverting(price_a, price_b, hurst_threshold=0.45, ...)). If you have a strong prior โ€” institutional desk policy, tighter risk tolerance โ€” set the threshold yourself.

What we explicitly don't do

  • No price prediction. No ML, no sentiment, no on-chain forensics.
  • No leverage, no shorts. Capital is always 100 % in one of two legs.
  • No auto-execution. We tell you when to swap; you place the order on your exchange.
  • No promises. Mean-reversion strategies break in directional regimes โ€” see Where this strategy fails.

Limitations

These are the known failure modes of the methodology. None are surprises and none are easy to fix.

  • Hurst R/S is a noisy estimator on short windows. With a 540-day window, the standard error on H is roughly ยฑ0.05. Two pairs with identical underlying processes can land on opposite sides of the 0.5 cutoff by coincidence. We mitigate by using a generous threshold and requiring all four filters to pass โ€” but a pair labelled H = 0.49 is not meaningfully different from one labelled H = 0.51.
  • ADF assumes stationary residuals. Structural breaks (a token forks, an exchange delists a leg, a stablecoin loses its peg) make the test misleading. Pre-2022 LUNA looked perfectly stationary up until the day it didn't. Our peg-check oracle layer catches the most obvious cases for tokenized assets, but no statistical test detects regime changes that haven't happened yet.
  • Tests are descriptive, not predictive. Hurst, ADF and the range filters all describe the past 540 days. Past mean-reversion does not guarantee future mean-reversion โ€” it only tells you that the pair was mean-reverting in the lookback. If the underlying relationship breaks after the lookback ends, the screen will keep showing the pair as a candidate until enough fresh data accumulates to flip the verdict.
  • Sample size matters. Below 200 days of history, none of these tests have meaningful power. We refuse to score pairs with under 540 days; even there, results are weaker than for pairs with 5+ years of data. New tokens start in our universe with a "history too short" verdict regardless of how mean-reverting they look intra-window.
  • Real execution adds slippage, taxes, exchange downtime โ€” none modelled. Backtests assume you can fill at the day's close at 0.1 % taker. In practice you fill at the next available price, on a Monday morning your exchange may be unavailable for an hour, and your jurisdiction may take 20 % of any realised gain. The strategy still works after these frictions for liquid pairs; it stops working for less-liquid ones.
  • xStocks-specific risks. Tokenized equities are < 12 months old as of writing. Mean-reversion claims on this short history are speculative. Backed Finance has paused redemptions during peg events in the past. If your jurisdiction restricts tokenized securities, none of this applies.

Open questions we're still working on

A short list of things we genuinely don't know yet, in rough order of how often they come up:

  1. Optimal lookback length. We use 540 days because it spans roughly one BTC halving cycle. Shorter windows react faster to regime shifts but overfit to recent noise; longer ones are more stable but anchored to history that may no longer apply (e.g. 2017 alt-season). Whether 540 is actually the right answer for crypto-native pairs vs. crypto-equity cross-asset pairs is an open question.
  2. Hurst vs. variance ratio test. R/S Hurst is the classical estimator but variance-ratio (Lo & MacKinlay, 1988) and detrended fluctuation analysis (DFA, Peng et al. 1994) have lower bias on short samples. We've been comparing all three on the production universe; results so far suggest variance-ratio gives nominally cleaner numbers but the rank order of pairs is essentially the same. Worth more investigation.
  3. Bayesian posterior on "is this pair still mean-reverting?". Right now the verdict is binary (passes filters / fails). A posterior probability that updates as new data arrives would let us communicate uncertainty better and quantify regime-change risk. Open question whether the additional complexity is worth it for retail-facing surfaces.
  4. Cross-asset peg integrity. For tokenized equities we cross-check against Pyth oracles every screening cycle. For tokenized commodities (PAXG, XAUT) we use yfinance for the underlying and tolerate larger drift bands. The right number for those bands isn't well-established empirically; we used 1.5 % based on observed drift distributions but it could plausibly be tighter.
  5. Trade-cost modelling. The 0.1 % taker fee assumption is conservative for large CEXes and aggressive for thin DEX pools. A more honest model would scale fees by venue and pair liquidity. Same for slippage โ€” currently we ignore it. Both are on the roadmap; not done yet.

If you have data or pointers on any of these, @pairscan on X or open an issue on the pairscan-rmr repo.

Long-tail RWA โ€” what fits the model and what doesn't

We get asked regularly whether ratio screening can be applied to other tokenized real-world assets โ€” fractional real estate (RealT), yield-tokenized derivatives (Pendle PT/YT), tokenized treasuries (USDY, OUSG, BUIDL, USDM). The honest answer is "mostly no", and it's worth explaining why so the scope is clear:

Fractional real estate (RealT, Lofty, etc.) โ€” each property token is unique. There's no continuous live price; valuations come from quarterly off-chain appraisals. Secondary markets are thin and sporadic. A ratio between two RealT tokens isn't a market-discovered ratio, it's the ratio of two stale appraisals. Mean-reversion logic doesn't apply because there's no high-frequency mean to revert to. We don't screen these.

Pendle PT/YT (Principal / Yield Tokens) โ€” PT trades at a discount to its underlying yield-bearing token (e.g. PT-aUSDC) and matures monotonically toward face value. YT accumulates yield and decays to zero at maturity. Both have predictable, deterministic price paths driven by time-to-maturity and the underlying yield rate. That's the opposite of mean-reversion: it's a known monotone trend. The right model for Pendle is yield-curve fitting, not pair screening.

Rebase-style tokenized treasuries (USDY / USTB / MTBILL) โ€” these are wired into PairScan as of Phase 5, but with caveats. Each token's on-chain price equals current per-token NAV ($1 + accumulated yield), not $1 flat. Pairing one against USDC produces a slow upward drift of ~5%/year (the yield) โ€” that's a trend, not a stationary ratio. The only plausibly mean-reverting pairs are same-class: USDY/USTB, USDY/MTBILL. Even those depend on yield-spread mean-reversion, which is much slower than the equity / crypto pairs the rest of the screen targets. We added the asset class so peg-check infrastructure exists for users monitoring drift, but we don't recommend trading those pairs from screen signals alone โ€” the 90-day window we override down to is borderline-undersampled and the noise floor is high.

OUSG, BUIDL, USDM, redemption-only tokens โ€” no Pyth or Chainlink oracle feed as of 2026-05. Without a reference price we can't run peg-check, and without peg-check the on-DEX price could deviate arbitrarily on any given block without us catching it. We deliberately don't whitelist these tokens until oracle coverage exists.

The short version: ratio mean-reversion is a sharp tool for a narrow set of asset classes. It works on stationary log-ratios with a stable variance and >100 days of joint history. Most "long-tail RWA" doesn't satisfy those conditions, and it's better to say so than to ship a screener that misclassifies known-trending series as mean-reverting candidates.

Open-source utility

The core filters and backtest engine are published as a Python library under MIT: github.com/pairscan/ratio-mean-reversion.

pip install pairscan-rmr
from pairscan_rmr import is_mean_reverting, walk_forward_backtest

result = is_mean_reverting(price_a, price_b)
if result.passed:
    backtest = walk_forward_backtest(price_a, price_b)
    print(f"{backtest.n_trades} trades, max DD {backtest.max_drawdown:.1%}")

The same code that runs in production runs in the library. If you find a bug, it's the same bug we have.

References

  • Hurst, H.E. (1951). "Long-Term Storage Capacity of Reservoirs". Transactions of the American Society of Civil Engineers, 116, 770โ€“799.
  • Dickey, D.A. & Fuller, W.A. (1979). "Distribution of the Estimators for Autoregressive Time Series with a Unit Root". JASA, 74(366a), 427โ€“431.
  • Lo, A.W. (1991). "Long-Term Memory in Stock Market Prices". Econometrica, 59(5), 1279โ€“1313. โ€” Standard reference on the small-sample bias of R/S Hurst.
  • Lo, A.W. & MacKinlay, A.C. (1988). "Stock Market Prices Do Not Follow Random Walks". Review of Financial Studies, 1(1), 41โ€“66. โ€” Variance-ratio test, alternative to Hurst.
  • Gatev, E., Goetzmann, W.N. & Rouwenhorst, K.G. (2006). "Pairs Trading: Performance of a Relative-Value Arbitrage Rule". Review of Financial Studies, 19(3), 797โ€“827.
  • Peng, C.-K. et al. (1994). "Mosaic organization of DNA nucleotides". Physical Review E, 49(2), 1685. โ€” Detrended fluctuation analysis (DFA), discussed under "open questions".