Data Catalog
Transparent, auditable task scorecards from Jigsaw’s RL training environments. Each task is evaluated across frontier models with deterministic grading and full provenance.
ML Productivity
View more →Analysis and debugging tasks over ML training artifacts. Agents navigate training runs, extract metrics from heterogeneous file formats, read framework source code, and synthesize diagnostic reports.
A
claude-opus-4 [max]X
grok-4.3 [reasoning]O
gpt-5.5 [xhigh]0%20%40%60%80%100%
AutoResearch
GPU-accelerated ML research tasks where agents design experiments, run training jobs, analyze results, and iterate on model architectures and training procedures.
Coming soon
0-1 Tasks
Greenfield engineering tasks where agents build functional systems from scratch: APIs, data pipelines, tooling, and end-to-end applications across multiple languages and frameworks.
Coming soon

