score --dx --ax --prod --pricing --perf
Scorecard
How to read these scores
86+ is excellent, 74-85 is solid, and anything below 74 needs active scrutiny before a team or agent depends on it.
cat ./evidence/claude-code.md
What Neurl built with it
Executed multi-step public-site changes using local commands, board updates, and explicit verification gates.
Agentic implementation across Astro pages, typed helpers, docs, and browser checks.
- Started from repo context
- Used plan/checklist execution
- Preserved unrelated user changes
- Verified build output
- Scores reflect Neurl hands-on evidence and should be re-verified before procurement or high-risk production adoption.
- Pricing, limits, model defaults, and product policies can change quickly; use freshness dates and vendor docs before final rollout.
when-to-use claude-code
Use it when
- Large repo refactor
- Agent tool use
- Prototype to demo
- terminal agents
- multi-step implementation
- careful diffs
avoid-if claude-code
Not a fit when
- design-only exploration without local context
- teams that need an IDE-first UX
- very low-latency pair programming
pricing --teardown
Pricing teardown
Usage-based economics favor focused engineering work; watch long-running exploratory sessions.
- Costs can be less predictable than fixed-seat tools
- Heavy context sessions need guardrails
prod --readiness
Production notes
Strong for production code when paired with tests, review, and explicit non-destructive git discipline.
- Needs clear sandbox permissions
- Can spend tokens exploring if task boundaries are vague
ls ./use-cases/claude-code
Best use cases
Plan-driven implementation
Good for multi-step work with docs, tests, and code edits.
Repo investigation
Strong when shell access and file search matter.