About this leaderboard
The QueryGym Leaderboard tracks reproducible query-reformulation results across IR benchmarks (BEIR, MS MARCO, TREC DL). Every row is backed by:
- a JSON file conforming to
reproducibility/schema.jsonv1, - a TREC-format
.run.txtfor re-evaluation, and - the reformulated queries TSV used to produce the run file.
All three live in the repository
under reproducibility/data/runs/{dataset}/{method}/{model}/.
Citing a number is as simple as linking the commit + the run JSON.
Submitting a result
Run the example pipeline, then use submit_run.py and open a PR.
python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2e --model gpt-4.1-mini \
--output-dir outputs/dl19_query2e
python -m reproducibility.scripts.submit_run --from-dir outputs/dl19_query2e
make repro-aggregate
git add reproducibility/data/ && git commit -m "..." && git push
gh pr create Full guide: Reproducibility User Guide ↗
Papers
- WWW 2026 Demos — QueryGym toolkit paper. arXiv 2511.15996 ↗
- SIGIR 2026 Reproducibility Track — multi-LLM baseline reproduction (link TBD).