Offline Usage
Benchmarks can be run ad-hoc, during development or as part of a CI/CD process using the uv run evals-hub run-benchmark command:
Configuration
Benchmarks can be configured via the commandline, a YAML file, or a mixture of the two.
Note
Commandline options always take precedence over and will override YAML file options.
Each benchmark configuration is an 'object' with the following fields (click on field types to see individual options):
| Field | Type | Description | Is Nested? | Required |
|---|---|---|---|---|
task_name |
str | Determines which task pipeline should be run | X | ✓ |
dataset |
DatasetConfig | Determines which dataset is used, along with (optionally) the split and HuggingFace subset | ✓ | ✓ |
model |
ModelConfig | Configuration for the model, including the choice of model and model settings | ✓ | ✓ |
judge |
ModelConfig | Configuration for the judge model (if applicable ) including the choice of model and model settings | ✓ | X |
evaluation |
EvaluationConfig | Experiment settings like batch size, seed, max concurrency etc. | ✓ | ✓ |
output |
OutputConfig | Choice of where to store the evaluation results | ✓ | ✓ |
CLI Configuration
CLI Options
Traceback (most recent call last):
File "/Users/fatemehtahavori/Desktop/dev/tomoro-evals/.venv/bin/evals-hub", line 4, in <module>
from evals_hub.cli import app
ModuleNotFoundError: No module named 'evals_hub'
YAML Configuration
When configuring a run via a YAML file, nested options (e.g. model.checkpoint) correspond to YAML nesting, i.e.:
run-benchmark:
task_name: qa
model:
checkpoint: ...
...
This can then be run via the following:
uv run evals-hub run-benchmark --config <CONFIG_FILE>