QueryGym
QueryGym Leaderboard
Reproducible benchmarks for LLM query reformulation.

About this leaderboard

The QueryGym Leaderboard tracks reproducible query-reformulation results across IR benchmarks (BEIR, MS MARCO, TREC DL). Every row is backed by a JSON file conforming to reproducibility/schema.json v1. Submissions may also include the reformulated-queries TSV and a TREC-format .run.txt for full re-evaluation; both are optional.

All artifacts live in the repository under reproducibility/data/runs/{dataset}/{method}/{model}/{retriever}/. Citing a number is as simple as linking the commit + the run JSON.

Submitting a result

Run the example pipeline, then use submit_run.py and open a PR.

submit.sh
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc --model gpt-4.1 \
    --output-dir outputs/dl19_query2doc

python -m reproducibility.scripts.submit_run --from-dir outputs/dl19_query2doc
make repro-aggregate
git add reproducibility/data/ && git commit -m "..." && git push
gh pr create

Full guide: Reproducibility User Guide ↗

Papers

Two papers back QueryGym. See the Cite page for BibTeX entries you can paste into your bibliography.