How to Backtest AI-Generated NinjaScript in NinjaTrader 8
A green compile is not a strategy. The compile says the syntax is right. The backtest is the verification step that says the strategy behaves the way the spec described, under fill conditions you can defend. This tutorial walks running NT8 Strategy Analyzer through CrossTrade MCP on an AI-generated NinjaScript strategy.
Why backtesting is the verification step for AI-generated strategies
The model can produce code that compiles cleanly and behaves nothing like the spec. The agent can produce metrics that look great on a tiny trade count or on a window that does not reflect live conditions. The backtest is the only step that catches the gap between "compiles" and "trades the way I want."
CrossTrade MCP's RunStrategyBacktest drives NT8's actual Strategy Analyzer engine. Single backtests are bit-identical to the desktop UI for the documented reference parameters. The agent's deploy decision and your read both rest on the same numbers.
Prerequisites
| Requirement | Detail |
|---|---|
| CrossTrade subscription | Elite |
| CrossTrade Add-On | v1.13.0 or higher |
| NinjaTrader 8 | Running, historical data downloaded for your range |
| MCP client | Any MCP client with mcp:trade |
| Strategy | Compiled and present in your NT8 user directory |
| Account | Sim101 |
Step 1: Confirm the strategy compiles
Before backtesting, run an in-memory compile to confirm the class is loadable:
CompileNinjaScript(in_memory: true) on the current source. Report success or
errors. Do not run a backtest yet.
If compile fails, see Debug NinjaScript Compile Errors with AI.
Step 2: Define the backtest scope
State the scope clearly. Realistic costs matter; a zero-cost backtest is a lie.
| Field | Recommended starting value |
|---|---|
| Instrument | The contract month you intend to trade (e.g., MES 06-26) |
| Bars | The bar period that matches the strategy logic (e.g., 5-minute) |
| Date range | Last 30 trading days minimum; 60-90 days for low-trade-count strategies |
| Account | Sim101 |
| Commission | $1.27 per round-trip per contract (or your broker's actual cost) |
| Slippage | 1 tick (more if your live conditions justify it) |
| Parameters | The same parameters the strategy will use live |
Step 3: Run the MCP backtest
RunStrategyBacktest with strategy SampleEmaCross, instrument MES 06-26, from
2026-04-17 to 2026-05-16, bars_period 5-minute, account Sim101, commission
1.27, slippage_ticks 1. Show the metrics block when it completes.
The call returns a job_id. The agent polls GetMcpJob until the status is completed.
Step 4: Poll async job status
For long ranges or parameter sweeps, the backtest is asynchronous. The agent should poll periodically:
GetMcpJob job_id <id> until status is completed or failed. Report progress
every 30 seconds if the job is still running.
If the job stalls, see the troubleshooting table.
Step 5: Read the results
The metrics block typically includes:
NetProfitProfitFactorMaxDrawdownTradeCountSharpe
These are not the only numbers that matter. Ask the agent for the trade list and skim the first 10 trades:
List the first 10 trades with entry time, exit time, side, P&L, and exit reason.
Tell me whether the entries respect the time filter and whether the exits look
like ATR trailing stops or something else.
If the trade list does not match the spec, the strategy logic is wrong even if the metrics look fine.
Step 6: Compare with NT8 Strategy Analyzer UI if needed
If you want to confirm parity, open Strategy Analyzer in NT8 manually and run the same configuration. Metrics should match for single backtests. See the NinjaScript Backtest Benchmark for the documented reference parameters.
Step 7: Run parameter sweeps responsibly
Sweeps overfit. Bound the ranges to ones your spec allows.
Sweep <param1> from <min> to <max> step <step>. Sweep <param2> from <min> to
<max> step <step>. Fitness NetProfit. Show top three. Flag any winner that
looks like a noise fit (narrow profitability cluster, tiny trade count, or
extreme parameter values). Re-run a full single backtest on the winner.
Re-run the full single backtest on the sweep winner to confirm. Anything less is the agent fooling itself.
Step 8: Document caveats
Write a one-page summary in the workspace:
- What the strategy does in plain English
- Backtest gates that passed and failed
- Slippage and commission assumptions
- Trade-count caveat (small samples are not predictive)
- Five reasons not to deploy live yet
- A test plan to falsify the result on out-of-sample data
This document is what makes the next session start in a useful place.
Common backtest mistakes
- Zero commission and zero slippage. Makes every strategy look better than live.
- Cherry-picked date range. A bull market window proves nothing about chop.
- Tiny trade count. Eight trades cannot demonstrate edge.
- Optimizing on the sample period. Sweeps trained and tested on the same window overfit.
- Ignoring news windows. Backtests cross FOMC and CPI cleanly. Live sessions do not.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
data_not_available | Historical data missing for the range | Download in NT8 or shorten the range |
Job stuck on running | Wide sweep or NT8 modal blocking | Open NT8, dismiss modals, or shorten the sweep |
strategy_not_compiled | Class not compiled | Compile first |
| Metrics differ between NT8 UI and MCP | Different parameters or slippage values | Compare the exact backtest config side by side |
FAQ
Are MCP backtests bit-identical to NT8 Strategy Analyzer?
Single backtests are bit-identical for the documented reference parameters. See the benchmark page.
Is a green backtest enough to deploy live?
No. Run on Sim101 for several sessions, verify firm rules for any funded account, and read the trade list, not just the metrics.
Can the agent overfit a sweep?
Yes. Always re-run a full single backtest on the sweep winner and consider an out-of-sample window.
Related
- Main site: AI NinjaScript
- Learn: Vibe Code a NinjaScript Strategy
- Learn: Debug NinjaScript Compile Errors
- Docs: RunStrategyBacktest
- Tool: NinjaScript Backtest Benchmark