QueryGym Leaderboard
Reproducible benchmarks for LLM query reformulation.
Datasets
Methods
Models
Retrievers
Cite
About
Toolkit
LLM Models
Backends used across the runs in this leaderboard.
gpt-4.1
270 runs
gpt-4.1-nano
270 runs
Qwen2.5-72B-Instruct
270 runs
Qwen2.5-7B-Instruct
270 runs