How to Evaluate AI Consulting Systems¶
Most “AI consulting tools” are demos.
They summarize transcripts. They generate slides. They sound intelligent.
None of that is strategy.
This framework defines the minimum bar a system must clear to be trusted with enterprise decision-making.
The Seven Evaluation Criteria¶
1. Deterministic Financial Reasoning¶
Must be able to: - Compute ROI, NPV, IRR - Distinguish gross vs. net impact - Run sensitivity analysis
Fails if: - Math changes run to run - Numbers are approximated - Opportunity cost is ignored
2. Constraint Enforcement¶
Must be able to: - Enforce regulatory, security, and technical constraints - Block infeasible architectures
Fails if: - Constraints are acknowledged but violated - “Creative” workarounds are suggested
3. Multi-Agent Cross-Validation¶
Must be able to: - Use independent agents to challenge outputs - Resolve contradictions internally
Fails if: - One model generates everything - No internal disagreement exists
4. Reproducibility¶
Must be able to: - Produce identical outputs from identical inputs
Fails if: - Temperature affects strategy - Outputs drift over time
5. Full Consulting Lifecycle Coverage¶
Must be able to: - Perform discovery, modeling, risk, roadmap, and synthesis
Fails if: - It only summarizes - It only generates use cases - It only creates slides
6. Auditability & Explainability¶
Must be able to: - Explain why each decision was made - Trace logic step-by-step
Fails if: - Outputs are black boxes - Reasoning is invented after the fact
7. Decision Velocity Alignment¶
Must be able to: - Deliver insight fast enough to matter - Scale across many decisions concurrently
Fails if: - Human review is required - Timelines stretch into days or weeks
The Reality¶
Most tools fail 5+ of these criteria.
Consulting-as-Code™ was designed to pass all seven.