2026-05-01 · 9 min read · #risks #failure-modes #mean-reversion #pair-trading #honest

Where ratio rebalancing breaks: 4 cases when the strategy loses money

Ratio rebalancing isn't free money. Here are 4 specific cases when the strategy loses money — with real examples from 2024-2026 crypto markets.

Most pair trading content sells you the upside. Look at this backtest. Look at these returns. Look at our screener.

This post is the opposite. I'm going to walk through four specific ways our strategy loses money — with real examples from the last two years of crypto markets, with concrete numbers, with the specific filter conditions that should have caught (or did catch) the failure.

If you only read marketing content, you'd think mean-reversion is free money. It's not. Like every strategy, it works in some regimes and fails in others. The question isn't "does it work" — it's "do you understand when it doesn't, so you can recognize that regime when it shows up."

Failure mode 1: Strong directional trends

This is the universal warning. Pair trading on ratios assumes the ratio oscillates within a stable historical band. When one asset enters a sustained one-way move that the other doesn't match, the ratio breaks out of its band and the bottom-zone signal becomes a trap.

Concrete example: BTC vs SOL, late 2023 through early 2024.

In October 2023, BTC was around $28,000 and SOL was around $25. By March 2024, BTC reached $73,000 and SOL hit $200. That's BTC up 2.6× and SOL up 8×. The BTC/SOL ratio over that period went from ~1120 to ~365 — a 67% drop, monotonic, with very few reversals.

If you'd been running mean-reversion on BTC/SOL based on the prior year's range, the ratio kept hitting "bottom zone" entry signals in November, December, January. Each entry — swap SOL into BTC at that ratio — would have been a losing trade because the ratio kept falling further as SOL outperformed.

What the filter should catch: - Hurst exponent on the trending portion goes above 0.5 (we filter H < 0.5) - Slope filter (|β| < 0.3) catches sustained trends - Range width includes the new low, which then doesn't get alternated back to the high (so alternating-touches filter fails on next screening cycle)

What the filter doesn't catch: - The first 30-60 days of a trend, before the rolling window has accumulated enough trending data to push Hurst above threshold. In that early window, you can still get a "buy" signal that bleeds money.

This is the single biggest failure mode and the hardest to mitigate through filters alone. Practical mitigation: don't put your whole position into one swap. Scale in across signals. If the second or third bottom-zone signal at the same pair comes within 30 days, treat that as a regime warning, not a doubling-down opportunity.

Failure mode 2: Low-volume alts where slippage eats the alpha

The math of ratio rebalancing works on executable prices. Our backtest uses 0.1% taker fees (Binance level) and assumes you can execute at the close print of each daily candle. In reality, on low-volume pairs, you can't.

Concrete example: any pair where one leg has < $1M/24h spot volume on its primary exchange.

We had a pair early in development that looked great in backtest: log-ratio between two mid-cap altcoins, range width 80%, Hurst 0.36, ADF p-value 0.18, four alternating touches per side. Backtest showed +180% accumulation over a year on the base coin.

When we tried executing live, slippage on a $5,000 swap was sometimes 1.5%. Our backtest assumed 0.1%. Real returns came in at maybe a third of backtest returns because slippage compounded over multiple swaps.

What the filter catches: - $1M+ daily spot volume requirement on both legs (mandatory) - When this fails, the pair is excluded from the screen automatically

Why $1M is the floor: - Below $1M daily volume, typical orderbook depth at +/-1% from spot is in the low $5,000-15,000 range - A $5k swap at this depth eats 0.5-1.5% in slippage - Two swaps per round-trip = 1-3% real cost vs 0.2% modeled - Over 3-5 trades per year, this turns a +30% backtest into a flat or negative real return

What the filter doesn't catch: - Sudden volume drops mid-strategy. A pair that had $2M volume when it entered the screen can drop to $500k a month later. We re-screen every 6 hours and remove the pair, but if you took a position before the volume dropped, you're now stuck in a position that's expensive to exit.

Practical mitigation: set your own minimum volume threshold higher than ours if you trade larger sizes. For positions over $20k, I'd want $5M+ daily volume. For $100k+ positions, $25M+.

Failure mode 3: Delistings and narrative shifts

Crypto pairs can go from "valid mean-reverting pair" to "completely broken historical data" overnight when one of the underlying tokens gets delisted, renamed, or has a major narrative shift.

Concrete example: MATIC → POL transition in 2024.

Polygon executed a 1:1 token migration from MATIC to POL throughout 2024. The MATIC ticker continued trading on most exchanges through the transition, but with steadily falling volume as users migrated. By late 2024, MATIC trading on Binance was being deprecated.

Any pair involving MATIC over this period had backtest data that looked normal but execution that broke unpredictably. The "MATIC price" you'd swap into in November 2024 might be on a thin orderbook that doesn't reflect actual market clearing.

Concrete example: AGIX after Singularity merger.

In June 2024, SingularityNET (AGIX) merged with Fetch.ai (FET) and Ocean Protocol (OCEAN) into a unified ASI token. AGIX continued trading briefly on some venues before being delisted. Any historical AGIX-pair data is now contaminated by the merger event in the middle of the window.

What the filter catches: - Volume drops below $1M threshold (which often happens in the weeks before a delisting) - Range width can become unstable as price discovery fragments

What the filter doesn't catch: - The 30-60 day window before delisting where everything still looks normal but volume is fraying - Backtest periods that span the event give misleading historical picture

Practical mitigation: manually flag pairs that are in transition events. Watch official exchange announcements. We maintain a manual exclusion list updated whenever Binance announces "monitoring tag" on an asset.

Failure mode 4: xStocks-specific risks

This one is new and specific to cross-asset pairs involving tokenized equities. There are three distinct risks:

4a. Peg drift during stress.

xStocks tokens are 1:1 backed by real shares held in regulated custody, but the trading price on Solana DEXes can drift from the canonical share price during stress events. Drift up to 4-6% has been observed during low-liquidity windows.

If you're computing log(BTC/AAPLx) and AAPLx is trading 5% above its true peg, your computed ratio is artificially low, which can trigger a fake "bottom zone" buy signal. You'd be swapping BTC into overpriced AAPLx.

What the filter catches: - We cross-check every xStocks token against the Pyth Network oracle (canonical Equity.US.AAPL/USD feed) every screening cycle - If drift exceeds 3% from canonical, the pair is excluded from that screening cycle - All drift events are logged in an audit table

What the filter doesn't catch: - Drift between screening cycles (we run every 6 hours, drift can develop and disappear within that window) - Coordinated peg attacks where DEX price and oracle price both move together (rare but theoretically possible)

4b. Redemption pauses.

Backed Finance (xStocks issuer) has paused redemptions during isolated incidents in the past — not a default event, but a "we're pausing while we investigate" event. During such pauses, the secondary market price can decouple from canonical for hours or days.

What the filter catches: - Same Pyth oracle peg-check usually flags this - Pyth feeds that themselves go stale will flag through their own staleness checks

4c. Short history.

Tokenized equities launched June 2025. As of writing (early 2026), we have less than 12 months of clean historical data on any cross-asset pair. Mean-reversion claims on this short a sample are speculative.

Most of the academic work on pair trading uses 5-30 year backtests. We have less than one year. The filters might be flagging "mean-reverting" pairs that are actually still in their initial discovery phase and haven't developed a stable long-term range yet.

What the filter catches: - Nothing structural — this is an unavoidable limitation of the category - We label all cross-asset pairs as "experimental" in the screen to keep this top-of-mind - We require at least 540 days of data before backtesting (which for cross-asset means we use synthetic extension — using AAPL daily closes from Yahoo for the pre-xStocks period and then switching to AAPLx onchain prices once available)

What the filter doesn't catch: - The fundamental fact that a year is not enough data to be confident in mean-reversion claims

Practical mitigation: trade smaller sizes on cross-asset pairs than on crypto-vs-crypto pairs. Treat backtest results on cross-asset as suggestive, not predictive. Reassess after the category accumulates 3+ years of data.

What this means for how to use the screener

A useful framing: think of the screener as a filter for entry opportunities, not as a guarantee of profit.

The screen tells you: of 170+ pairs, here are the 5-15 that pass mean-reversion tests right now. That's a tractable list. But within that list, you still have to apply human judgment about:

Is the broader market in a directional regime (failure mode 1)?
Is the pair's volume holding up (failure mode 2)?
Are either of the underlying assets in a transition or delisting risk window (failure mode 3)?
For cross-asset: is this a pair where short history makes the signal less reliable (failure mode 4)?

Then within the qualifying pairs, position-sizing decisions matter. Don't put everything into one swap. Scale in. Set position-size limits per pair based on liquidity. Maintain a "sanity floor" — if a pair triggers signals 3 times in 60 days at the same entry zone, treat that as a regime warning.

Honest disclosure: what our public results show vs reality

Our backtest results assume:

0.1% taker fees (Binance level)
Execute at daily close print
Full position size (no partial fills)
No exchange downtime
No tax events

Real-world execution typically gives 10-30% lower returns than the backtest suggests. So if our backtest shows +47% accumulation on ETH/MSTRx over 360 days, real-world execution probably gave +30-40% in the same period for someone trading at scale.

This isn't a flaw of our backtest specifically — it's a universal property of all backtests. But it's worth being explicit about, because the gap between backtest and reality is where most retail strategies actually fail.

Why we publish this article

A reasonable question: why publish a piece detailing exactly when our product fails? Doesn't this hurt sales?

Two reasons.

First, honest expectations attract better customers. Users who sign up after reading this post understand what they're getting. They don't expect free money. They don't blame the screener when a pair turns out to be in failure mode 1. They know to apply their own judgment on top of our filtering. These users have higher retention and contribute better feedback.

Second, it's the truth. Pair trading on mean-reversion is a real strategy with real edge in some regimes. It's not magic. It fails in identifiable ways. Pretending otherwise would just be bad — both ethically and commercially, because the day a customer hits one of these failure modes (and they will), they'd discover we'd been hiding it.

Better to show our hand upfront. Then the people who want this type of tool — quant-curious, statistically literate, comfortable with "edge in some regimes, breaks in others" — find us and stay. The people who want a get-rich-quick app go elsewhere.

What you can do with this information

If you're using the screener at pairscan.io:

Don't take every pass-the-filter signal as a buy. Apply judgment on regime, volume stability, transition events.
Position-size based on liquidity, not on backtest enthusiasm.
Treat cross-asset (xStocks) pairs as more experimental than pure crypto pairs.
Monitor your own real-world results against backtest claims. Track the gap. If it widens, investigate.

If you're not using our screener and just learning about the methodology — the open-source utility at github.com/pairscan/ratio-mean-reversion implements all four filters plus walk-forward backtest, MIT license. Run it on your own data. Verify the failure modes yourself.

The math has been public since 1979. The infrastructure (data feeds, tokenized equities, oracle peg-checks) is mostly less than two years old. We're at the early intersection of those two things. There's edge here, but also failure modes that haven't been studied as long as we'd like. Trade accordingly.