QueryGym
QueryGym Leaderboard
SIGIR 2026 Reproducibility — Query Reformulation × LLMs × Datasets

About this leaderboard

The QueryGym Leaderboard tracks reproducible query-reformulation results across IR benchmarks (BEIR, MS MARCO, TREC DL). Every row is backed by:

All three live in the repository under reproducibility/data/runs/{dataset}/{method}/{model}/. Citing a number is as simple as linking the commit + the run JSON.

Submitting a result

Run the example pipeline, then use submit_run.py and open a PR.

python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e --model gpt-4.1-mini \
    --output-dir outputs/dl19_query2e

python -m reproducibility.scripts.submit_run --from-dir outputs/dl19_query2e
make repro-aggregate
git add reproducibility/data/ && git commit -m "..." && git push
gh pr create

Full guide: Reproducibility User Guide ↗

Papers