Hypothetical Performance Disclosure
Status: Human-voice rewrite, pending counsel review. Document version: 2026.05.01 Last updated: 2026-05-01 Effective date: 2026-05-01
What this means in plain English: When we show you a backtest, win rate, or Sharpe ratio, that number describes what our model would have done on past data. It is not what your account will do. Real trading has costs, slippage, missed entries, and emotion that backtests do not capture. Treat every historical number as a rough sketch, not a forecast.
Concrete example: A backtest showing an 80% win rate over 2024 means we ran our model on 2024 data and it would have won 80% of those trades. It does not mean 80% of the trades you take this year will win. Your real fills will be worse than the simulator's, you may miss entries entirely, and your psychology in a drawdown is not in the model.
This document accompanies any backtested, simulated, hypothetical, or model-track-record number we publish. It is referenced from risk_disclosure.md §2.3. The short version (section 1 below) is shown directly under any performance chart that is not the result of actual trading.
1. The short version (shown under every backtest chart)
The performance shown above is hypothetical. It is the output of a model run against historical data. No real trades took place. Hypothetical results have inherent limits. We make no representation that your account will achieve similar profits or losses. Past performance does not guarantee future results.
2. The long version
2.1. The CFTC Rule 4.41(b) baseline
CFTC Rule 4.41(b) requires the disclaimer below wherever hypothetical performance is presented in connection with commodity-futures trading. The Service may not always come within the CFTC's regulatory scope, but we follow the rule's substance for every backtest we publish. The regulator's text, in full, verbatim:
These results are based on simulated or hypothetical performance results that have certain inherent limitations. Unlike the results shown in an actual performance record, these results do not represent actual trading. Also, because these trades have not actually been executed, these results may have under-or over-compensated for the impact, if any, of certain market factors, such as lack of liquidity. Simulated or hypothetical trading programs in general are also subject to the fact that they are designed with the benefit of hindsight. No representation is being made that any account will or is likely to achieve profits or losses similar to these being shown.
2.2. The named-cause list
On top of the CFTC baseline, every hypothetical or backtested result on the Service is subject to these specific limits:
- (a) No actual execution. No order touched a real market. The result assumes you would have got the price the model assumed. In real life your order would have collided with someone else's, moved the book itself, or missed the level entirely.
- (b) Slippage. When a strategy assumes execution at the close, at the next-bar open, or at any specific level, the simulator may understate slippage by a meaningful amount. Bid-ask spreads, market impact, and partial fills are not perfectly modelled.
- (c) Latency. Real systems have latency from signal to broker. Backtests treat the signal time as the execution time. The gap matters most on intraday strategies and on thinly-traded instruments.
- (d) Liquidity. A strategy that works on a 10,000-share order may break at 1,000,000 shares. A symbol with $10M average daily volume cannot absorb a meaningful position without moving the price. Backtests do not perfectly model this.
- (e) Order rejections. Brokers reject orders for many reasons (margin, restricted lists, halts, circuit breakers, duplicate detection, malformed routes). Backtests assume every order fills.
- (f) Costs and fees. When a backtest does not subtract commissions, exchange fees, financing costs (overnight rates on leveraged positions), regulatory fees, or the bid-ask spread, the result overstates what a real account would have made. We mark every backtest with whether costs are subtracted.
- (g) Hindsight design. A model designed against a known historical period may have been (deliberately or accidentally) selected because of how it performed on that data. We use walk-forward testing where we can, plus Benjamini-Hochberg false-discovery-rate correction when multi-testing applies. No method removes hindsight bias entirely.
- (h) Survivorship bias. Universes built from currently-listed symbols miss companies that were delisted or went bankrupt. We use survivorship-bias-aware universes for our primary backtests where data is available.
- (i) Look-ahead bias. A backtest using a value that was only known after the fact will overstate performance. Our code is written to forbid this. No codebase is bug-free; if you spot one, email engineering@ikaaycapital.com.
- (j) Regime change. A model that fits one market regime may underperform or break in another. We do not promise our models adapt to every regime.
- (k) Discipline. A real trader sitting through a 30% drawdown closes positions, takes time off, or changes strategy. Backtests do not. Real trading psychology produces a different equity curve from a backtest.
This list is not exhaustive.
2.3. Cost-exclusion language for any backtest that excludes costs
When the Service publishes a return number that does not subtract trading costs (gross of commissions, spread, financing, regulatory fees, and the price impact of the trade), the chart will display this sentence right next to it:
The returns shown above are higher than an investor could achieve in a real account, because they exclude one or more of the following: commissions, exchange and clearing fees, the bid-ask spread, the price impact of the trade, financing costs on leveraged positions, and the overnight return from the close of one trading day to the open of the next. The omitted cost in [%/period] terms is approximately [X]%/[period].
We try never to publish a fully cost-excluded number on the public surface. When we do, that call-out is mandatory.
2.4. In-sample versus realised performance
We tell three kinds of performance number apart:
- (a) In-sample backtest. A model run on the same data we used to design and tune it. By definition this is optimistic. Charts label it "(in-sample)" and add an "inflated" annotation when raw Sharpe is greater than 3.
- (b) Out-of-sample backtest. A model run on data we did not use to design it, typically through walk-forward testing. Less optimistic than in-sample but still hypothetical; everything in §2.2 still applies.
- (c) Realised performance. The track record of signals after the Service published them. The Engine keeps a
tracked_signalsledger of every signal it generates, with the realised outcome recorded on close. Realised win rate, average return, and profit factor come from this ledger and are labelled "(realised)" on every chart.
When in-sample and realised differ — and they always do — we show both side by side. The realised number is the truthful one. The in-sample is for context.
2.5. Forward-looking statements
When the Service mentions a forecast, a confidence band, or an expected return, that statement is a model output. It is not a prediction we stand behind in the strong sense. Forward-looking statements depend on assumptions that may not hold. Real results may differ a lot.
We do not say "AI predicts X". We say "the model output is X". That framing matters.
2.6. Specific metrics — what to watch out for
- (a) Sharpe ratio. Annualised risk-adjusted return; sensitive to the assumed risk-free rate, the data frequency, and the period. A daily-bar Sharpe is usually higher than a monthly-bar Sharpe on the same series. We disclose the bar frequency and risk-free rate.
- (b) Profit factor. Gross winning trades divided by gross losing trades. A profit factor above 2 over a small sample is noise, not signal.
- (c) Win rate. Fraction of trades that closed in profit. You need a large sample for this to mean anything (we use roughly N >= 100 closed trades as a useful threshold; below that, treat the number with the same scepticism as a coin-flip experiment with 20 flips).
- (d) Maximum drawdown. Peak-to-trough decline in the equity curve. Backtests over short periods understate plausible drawdowns.
- (e) Sortino ratio. Like Sharpe but penalises only downside volatility. Same caveats apply.
- (f) Alpha. Outperformance vs a chosen benchmark. The choice of benchmark moves the number; we disclose which benchmark.
- (g) Calmar ratio. Annualised return divided by maximum drawdown. Sensitive to the worst observed drawdown and so to the historical period.
2.7. Backtest vs live banding
When we publish a backtest curve and a live curve for the same strategy, the backtest is shown in a muted colour and the live is shown in the brand colour. The two are aligned at the live-trading start date, so regime change and live-vs-backtest divergence are visually obvious. We do not recommend strategies whose live curve has diverged below the backtest curve by more than the historical worst drawdown.
3. Implementation notes for the engineering team
- Every chart that renders model performance must call the
<HypotheticalDisclosure variant="short" />component, which renders section 1 above. Charts that suppress the component must be code-reviewed byrisk-reviewer. - Long-form pages that show backtests (research notes, strategy explanations) must link to this document at the bottom and surface it inline at first occurrence.
- The
cost-excludedflag is set in the strategy metadata. The chart component shows section 2.3 if the flag is true. - In-sample vs realised labels come from the strategy run record and render automatically. Do not hand-edit them.
- The
tracked_signalsledger is the realised-performance source of truth. Any chart that pulls from a different source must be flagged.
Change log
| Version | Date | Change |
|---|---|---|
| 2026.05.01 | 2026-05-01 | Human-voice rewrite. Plain-English intro added with concrete 80%-win-rate example, sub-sections converted to bullet lists, "for the avoidance of doubt" removed, "shall" replaced. Substance unchanged. Pending counsel review. |
| 2026.04.28 | 2026-04-28 | Initial draft. CFTC Rule 4.41(b) baseline + Streak-style named-cause list + Zacks-style cost-exclusion + in-sample vs realised banding. |
Pattern provenance
- §2.1 CFTC verbatim: 17 CFR §4.41(b) (1981, current).
- §2.2 named-cause list: Streak (Zerodha) /disclosure verbatim ("slippage, latency, liquidity limitations, or order rejections") expanded into a longer audit-quality list.
- §2.3 cost-exclusion: Zacks /performance_disclosure/ verbatim pattern. The "commissions, exchange and clearing fees, bid-ask spread, price impact, financing, overnight return" list is Zacks's contribution; we name each cost on the chart so the disclosure is not generic.
- §2.4 in-sample vs realised: Tickeron content-page disclaimer ("Past hypothetical backtest results are neither an indicator nor a guarantee of future returns") expanded into a three-class taxonomy. The "realised" label is the platform's own convention.
- §2.5 AI framing: explicit response to audit pattern A5 ("AI predicts" is regulator-bait). The platform does not use that language.