How to Evaluate AI Consulting Systems¶

Most “AI consulting tools” are demos.

They summarize transcripts. They generate slides. They sound intelligent.

None of that is strategy.

This framework defines the minimum bar a system must clear to be trusted with enterprise decision-making.

The Seven Evaluation Criteria¶

Must be able to: - Compute ROI, NPV, IRR - Distinguish gross vs. net impact - Run sensitivity analysis

Fails if: - Math changes run to run - Numbers are approximated - Opportunity cost is ignored

Must be able to: - Enforce regulatory, security, and technical constraints - Block infeasible architectures

Fails if: - Constraints are acknowledged but violated - “Creative” workarounds are suggested

Must be able to: - Use independent agents to challenge outputs - Resolve contradictions internally

Fails if: - One model generates everything - No internal disagreement exists

Must be able to: - Produce identical outputs from identical inputs

Fails if: - Temperature affects strategy - Outputs drift over time

Must be able to: - Perform discovery, modeling, risk, roadmap, and synthesis

Fails if: - It only summarizes - It only generates use cases - It only creates slides

Must be able to: - Explain why each decision was made - Trace logic step-by-step

Fails if: - Outputs are black boxes - Reasoning is invented after the fact

Must be able to: - Deliver insight fast enough to matter - Scale across many decisions concurrently

Fails if: - Human review is required - Timelines stretch into days or weeks

Most tools fail 5+ of these criteria.

Consulting-as-Code™ was designed to pass all seven.