Basic Backtest with Clean Data¶

This guide walks through a complete backtest of a simple RSI mean-reversion strategy on BTC-USDC hourly candles. The goal is to show the pipeline — not to advocate for this specific strategy.

Prerequisites¶

pip install "hypquant[pandas]" pandas numpy

1. Fetch the data¶

from hypquant import MarketData
import polars as pl

SYMBOL     = "BTC-USDC"
EXCHANGE   = "hyperliquid"
TIMEFRAME  = "1h"
START      = "2024-01-01"
END        = "2024-06-01"

with MarketData(api_key="qp_...") as md:
    # Check data quality first
    quality = md.quality(SYMBOL, exchange=EXCHANGE, timeframe=TIMEFRAME)
    dqs = float(quality["timeframes"][TIMEFRAME]["dqs"])
    assert dqs >= 0.90, f"DQS too low for reliable backtest: {dqs:.3f}"
    print(f"DQS: {dqs:.3f} — proceeding")

    # Fetch OHLCV
    ohlcv = md.ohlcv(SYMBOL, exchange=EXCHANGE, timeframe=TIMEFRAME,
                     start=START, end=END, limit=10000)

    # Fetch RSI pre-computed
    feat = md.features(SYMBOL, exchange=EXCHANGE, timeframe=TIMEFRAME,
                       features=["rsi_14"], start=START, end=END)

# Join on timestamp
df = ohlcv.join(feat, on="time", how="left")
print(f"Rows: {len(df)}, columns: {df.columns}")

2. Define signals¶

Simple RSI reversal: go long when RSI < 30, exit when RSI > 50.

# Signals: 1 = long, 0 = flat
df = df.with_columns([
    pl.when(pl.col("rsi_14") < 30).then(pl.lit(1))
      .when(pl.col("rsi_14") > 50).then(pl.lit(0))
      .otherwise(None)  # hold current position
      .alias("raw_signal")
])

# Forward-fill to hold position
df = df.with_columns(
    pl.col("raw_signal").forward_fill().fill_null(0).alias("position")
)

# Entry/exit prices: use next candle's open (realistic)
df = df.with_columns(
    pl.col("open").shift(-1).alias("entry_price")
)

3. Compute returns¶

# Hourly return of the underlying
df = df.with_columns(
    ((pl.col("close") - pl.col("open")) / pl.col("open")).alias("candle_return")
)

# Strategy return: position × candle return (no leverage)
df = df.with_columns(
    (pl.col("position") * pl.col("candle_return")).alias("strategy_return")
)

# Cumulative returns
df = df.with_columns([
    (1 + pl.col("candle_return")).cum_prod().alias("bnh_cumret"),
    (1 + pl.col("strategy_return")).cum_prod().alias("strat_cumret"),
])

4. Calculate statistics¶

import numpy as np

returns = df["strategy_return"].drop_nulls().to_numpy()
bnh     = df["candle_return"].drop_nulls().to_numpy()

def annualized_sharpe(r, periods_per_year=8760):
    return (r.mean() / r.std()) * np.sqrt(periods_per_year) if r.std() > 0 else 0

def max_drawdown(cumret):
    rolling_max = np.maximum.accumulate(cumret)
    return ((cumret - rolling_max) / rolling_max).min()

strat_cumret = df["strat_cumret"].to_numpy()
bnh_cumret   = df["bnh_cumret"].to_numpy()

print(f"Strategy total return: {(strat_cumret[-1] - 1) * 100:.1f}%")
print(f"Buy & hold return:     {(bnh_cumret[-1]   - 1) * 100:.1f}%")
print(f"Strategy Sharpe (1h):  {annualized_sharpe(returns):.2f}")
print(f"Buy & hold Sharpe:     {annualized_sharpe(bnh):.2f}")
print(f"Max drawdown:          {max_drawdown(strat_cumret) * 100:.1f}%")
print(f"Trades:                {(df['position'].diff().abs() > 0).sum()}")

5. Sanity checks¶

Before trusting backtest results, verify:

# No look-ahead bias: position is based on RSI at candle close,
# trades execute at the *next* candle's open.
assert df.filter(pl.col("position").shift(1) != pl.col("position")).select(
    (pl.col("entry_price") == pl.col("open")).all()
).item(), "Entry price must be next open"

# No gaps in the time index
diffs = df["time"].diff().drop_nulls()
max_gap_hours = diffs.dt.total_minutes().max() / 60
print(f"Max gap in data: {max_gap_hours:.1f} hours")
if max_gap_hours > 2:
    print("WARNING: significant gap detected — check DQS components")

6. Real vs. raw data comparison¶

To illustrate why clean data matters, run the same backtest using raw data from the exchange (without gap filling and timestamp normalization) and compare the Sharpe ratios. The HypQuant internal benchmark shows approximately:

Dataset	Sharpe	Max DD
HypQuant clean	0.82	-18%
Raw exchange data (with gaps)	0.41	-24%

The Sharpe difference comes from NaN propagation in RSI calculation during gap periods, which creates spurious signals. HypQuant's forward-fill policy eliminates this noise for short gaps (≤ 2 candles).

Next steps¶

Funding Arb Strategy — using funding rates as a signal
Market Regime Analysis — conditioning strategies on market regime
Features Catalogue — 20 features to add to your signals