Skip to content

Basic Backtest with Clean Data

This guide walks through a complete backtest of a simple RSI mean-reversion strategy on BTC-USDC hourly candles. The goal is to show the pipeline — not to advocate for this specific strategy.


Prerequisites

pip install "hypquant[pandas]" pandas numpy

1. Fetch the data

from hypquant import MarketData
import polars as pl

SYMBOL     = "BTC-USDC"
EXCHANGE   = "hyperliquid"
TIMEFRAME  = "1h"
START      = "2024-01-01"
END        = "2024-06-01"

with MarketData(api_key="qp_...") as md:
    # Check data quality first
    quality = md.quality(SYMBOL, exchange=EXCHANGE, timeframe=TIMEFRAME)
    dqs = float(quality["timeframes"][TIMEFRAME]["dqs"])
    assert dqs >= 0.90, f"DQS too low for reliable backtest: {dqs:.3f}"
    print(f"DQS: {dqs:.3f} — proceeding")

    # Fetch OHLCV
    ohlcv = md.ohlcv(SYMBOL, exchange=EXCHANGE, timeframe=TIMEFRAME,
                     start=START, end=END, limit=10000)

    # Fetch RSI pre-computed
    feat = md.features(SYMBOL, exchange=EXCHANGE, timeframe=TIMEFRAME,
                       features=["rsi_14"], start=START, end=END)

# Join on timestamp
df = ohlcv.join(feat, on="time", how="left")
print(f"Rows: {len(df)}, columns: {df.columns}")

2. Define signals

Simple RSI reversal: go long when RSI < 30, exit when RSI > 50.

# Signals: 1 = long, 0 = flat
df = df.with_columns([
    pl.when(pl.col("rsi_14") < 30).then(pl.lit(1))
      .when(pl.col("rsi_14") > 50).then(pl.lit(0))
      .otherwise(None)  # hold current position
      .alias("raw_signal")
])

# Forward-fill to hold position
df = df.with_columns(
    pl.col("raw_signal").forward_fill().fill_null(0).alias("position")
)

# Entry/exit prices: use next candle's open (realistic)
df = df.with_columns(
    pl.col("open").shift(-1).alias("entry_price")
)

3. Compute returns

# Hourly return of the underlying
df = df.with_columns(
    ((pl.col("close") - pl.col("open")) / pl.col("open")).alias("candle_return")
)

# Strategy return: position × candle return (no leverage)
df = df.with_columns(
    (pl.col("position") * pl.col("candle_return")).alias("strategy_return")
)

# Cumulative returns
df = df.with_columns([
    (1 + pl.col("candle_return")).cum_prod().alias("bnh_cumret"),
    (1 + pl.col("strategy_return")).cum_prod().alias("strat_cumret"),
])

4. Calculate statistics

import numpy as np

returns = df["strategy_return"].drop_nulls().to_numpy()
bnh     = df["candle_return"].drop_nulls().to_numpy()

def annualized_sharpe(r, periods_per_year=8760):
    return (r.mean() / r.std()) * np.sqrt(periods_per_year) if r.std() > 0 else 0

def max_drawdown(cumret):
    rolling_max = np.maximum.accumulate(cumret)
    return ((cumret - rolling_max) / rolling_max).min()

strat_cumret = df["strat_cumret"].to_numpy()
bnh_cumret   = df["bnh_cumret"].to_numpy()

print(f"Strategy total return: {(strat_cumret[-1] - 1) * 100:.1f}%")
print(f"Buy & hold return:     {(bnh_cumret[-1]   - 1) * 100:.1f}%")
print(f"Strategy Sharpe (1h):  {annualized_sharpe(returns):.2f}")
print(f"Buy & hold Sharpe:     {annualized_sharpe(bnh):.2f}")
print(f"Max drawdown:          {max_drawdown(strat_cumret) * 100:.1f}%")
print(f"Trades:                {(df['position'].diff().abs() > 0).sum()}")

5. Sanity checks

Before trusting backtest results, verify:

# No look-ahead bias: position is based on RSI at candle close,
# trades execute at the *next* candle's open.
assert df.filter(pl.col("position").shift(1) != pl.col("position")).select(
    (pl.col("entry_price") == pl.col("open")).all()
).item(), "Entry price must be next open"

# No gaps in the time index
diffs = df["time"].diff().drop_nulls()
max_gap_hours = diffs.dt.total_minutes().max() / 60
print(f"Max gap in data: {max_gap_hours:.1f} hours")
if max_gap_hours > 2:
    print("WARNING: significant gap detected — check DQS components")

6. Real vs. raw data comparison

To illustrate why clean data matters, run the same backtest using raw data from the exchange (without gap filling and timestamp normalization) and compare the Sharpe ratios. The HypQuant internal benchmark shows approximately:

Dataset Sharpe Max DD
HypQuant clean 0.82 -18%
Raw exchange data (with gaps) 0.41 -24%

The Sharpe difference comes from NaN propagation in RSI calculation during gap periods, which creates spurious signals. HypQuant's forward-fill policy eliminates this noise for short gaps (≤ 2 candles).


Next steps