Skip to main content

Documentation Index

Fetch the complete documentation index at: https://trygradient.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Scoring

Gradient’s scoring engine uses an LLM-as-judge approach to evaluate candidate submissions across multiple dimensions. Our scoring framework is informed by Anthropic’s AI Fluency Index, which identifies the core competencies that distinguish effective AI-augmented knowledge workers. The result is a detailed score breakdown with percentile rankings and suggested follow-up interview questions.

Scoring categories

The default rubric evaluates candidates across five categories totaling 100 points:
CategoryPointsWhat it measures
Deliverable Quality40Content accuracy, analytical depth, structure, visual quality
Delegation10How effectively the candidate directed the AI
Description20Clarity of prompts and instructions given to the AI
Discernment20Ability to evaluate AI output, catch errors, and verify information
Diligence10Thoroughness in workspace setup, data source usage, and iteration
These categories can be customized per assessment using the scoring rubric configuration. See the Custom Rubrics guide.

How scoring works

1

Trigger

After a candidate submits, an admin triggers scoring via the dashboard or the Scoring API. Scoring can also be re-run on already-scored sessions.
2

Analysis

The scoring engine runs multiple analysis modules in parallel. Each module examines different aspects of the submission: the deliverable itself, the conversation transcript, event patterns, and the AI configuration snapshot.
3

Scoring

Each sub-criterion receives a score using one of three methods:
  • LLM Judge: An AI evaluator assesses quality against rubric criteria
  • Event Analysis: Automated analysis of behavioral patterns (e.g., did they connect data sources?)
  • Hybrid: Combines both approaches
4

Percentiles

Scores are benchmarked against three baselines: other candidates on the same assessment, candidates in the same role category, and all candidates globally.

Follow-up questions

The scoring engine also generates suggested follow-up questions for the hiring manager. These are specific, evidence-based questions tied to things the candidate did (or didn’t do) during the assessment. For example, if a candidate accepted incorrect data from the AI without verification, the engine might suggest:
“I noticed you included a revenue figure that doesn’t match the source document. Walk me through your verification process for AI-generated data.”
Each question includes:
  • The question itself
  • Rationale: Why this question matters (what the scoring engine observed)
  • What to look for: Strong vs. weak answers

Admin review

After automated scoring, admins can:
  • Review the detailed score breakdown per category and sub-criterion
  • Replay the candidate’s session (every action, message, and edit)
  • Adjust individual category scores with the Override Score API
  • Add review notes
  • Release feedback to the candidate (immediately or after a configurable delay)