QueryGym
QueryGym Leaderboard
Reproducible benchmarks for LLM query reformulation.
← Datasets

DL-HARD

msmarco-v1-passage.dlhard
All results produced by QueryGym · fully reproducible!

120 (method × LLM × retriever) configurations evaluated on this dataset.
Click any row or the + button to expand. The three steps (reformulate → retrieve → evaluate) for that run appear inline.

Retriever
Model
Method
120 / 120 runs
best in column
Method LLM Retriever nDCG@10 R@1k
csqe Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.37570.8531
methodcsqe llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
csqe Qwen2.5-72B-Instruct BM25 0.28480.6998
methodcsqe llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
csqe Qwen2.5-72B-Instruct SPLADE++ 0.28570.8246
methodcsqe llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
csqe Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.36710.8348
methodcsqe llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
csqe Qwen2.5-7B-Instruct BM25 0.33220.7913
methodcsqe llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
csqe Qwen2.5-7B-Instruct SPLADE++ 0.30250.8057
methodcsqe llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
csqe gpt-4.1 BGE-base-en-v1.5 0.41440.8640
methodcsqe llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
csqe gpt-4.1 BM25 0.36580.7873
methodcsqe llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
csqe gpt-4.1 SPLADE++ 0.36900.8341
methodcsqe llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
csqe gpt-4.1-nano BGE-base-en-v1.5 0.35160.8371
methodcsqe llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
csqe gpt-4.1-nano BM25 0.24360.7327
methodcsqe llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
csqe gpt-4.1-nano SPLADE++ 0.27890.7872
methodcsqe llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.
genqr Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.34710.8144
methodgenqr llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
genqr Qwen2.5-72B-Instruct BM25 0.20910.6822
methodgenqr llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
genqr Qwen2.5-72B-Instruct SPLADE++ 0.29160.7861
methodgenqr llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
genqr Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.33750.8235
methodgenqr llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
genqr Qwen2.5-7B-Instruct BM25 0.20060.6458
methodgenqr llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
genqr Qwen2.5-7B-Instruct SPLADE++ 0.33860.8000
methodgenqr llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
genqr gpt-4.1 BGE-base-en-v1.5 0.38700.8402
methodgenqr llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
genqr gpt-4.1 BM25 0.29210.7434
methodgenqr llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
genqr gpt-4.1 SPLADE++ 0.38000.8488
methodgenqr llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
genqr gpt-4.1-nano BGE-base-en-v1.5 0.35860.8389
methodgenqr llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
genqr gpt-4.1-nano BM25 0.17430.6575
methodgenqr llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
genqr gpt-4.1-nano SPLADE++ 0.30430.8408
methodgenqr llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.
genqr_ensemble Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.35430.8269
methodgenqr_ensemble llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
genqr_ensemble Qwen2.5-72B-Instruct BM25 0.24630.6975
methodgenqr_ensemble llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
genqr_ensemble Qwen2.5-72B-Instruct SPLADE++ 0.28490.7823
methodgenqr_ensemble llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
genqr_ensemble Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.37130.8356
methodgenqr_ensemble llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
genqr_ensemble Qwen2.5-7B-Instruct BM25 0.24290.7210
methodgenqr_ensemble llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
genqr_ensemble Qwen2.5-7B-Instruct SPLADE++ 0.32920.8005
methodgenqr_ensemble llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
genqr_ensemble gpt-4.1 BGE-base-en-v1.5 0.35720.8633
methodgenqr_ensemble llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
genqr_ensemble gpt-4.1 BM25 0.26970.7775
methodgenqr_ensemble llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
genqr_ensemble gpt-4.1 SPLADE++ 0.30470.8207
methodgenqr_ensemble llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
genqr_ensemble gpt-4.1-nano BGE-base-en-v1.5 0.35790.8282
methodgenqr_ensemble llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
genqr_ensemble gpt-4.1-nano BM25 0.21540.6990
methodgenqr_ensemble llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
genqr_ensemble gpt-4.1-nano SPLADE++ 0.32330.8400
methodgenqr_ensemble llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.
lamer Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.40550.8453
methodlamer llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
lamer Qwen2.5-72B-Instruct BM25 0.36350.7820
methodlamer llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
lamer Qwen2.5-72B-Instruct SPLADE++ 0.36480.8156
methodlamer llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
lamer Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.37880.8315
methodlamer llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
lamer Qwen2.5-7B-Instruct BM25 0.35700.7633
methodlamer llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
lamer Qwen2.5-7B-Instruct SPLADE++ 0.32800.7917
methodlamer llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
lamer gpt-4.1 BGE-base-en-v1.5 0.41200.8557
methodlamer llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
lamer gpt-4.1 BM25 0.35550.8065
methodlamer llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
lamer gpt-4.1 SPLADE++ 0.36730.8246
methodlamer llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
lamer gpt-4.1-nano BGE-base-en-v1.5 0.37590.8352
methodlamer llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
lamer gpt-4.1-nano BM25 0.33980.7697
methodlamer llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
lamer gpt-4.1-nano SPLADE++ 0.34590.7969
methodlamer llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.
mugi Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.39480.8548
methodmugi llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
mugi Qwen2.5-72B-Instruct BM25 0.36090.8122
methodmugi llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
mugi Qwen2.5-72B-Instruct SPLADE++ 0.32600.8098
methodmugi llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
mugi Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.36190.8495
methodmugi llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
mugi Qwen2.5-7B-Instruct BM25 0.31730.7707
methodmugi llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
mugi Qwen2.5-7B-Instruct SPLADE++ 0.26420.8028
methodmugi llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
mugi gpt-4.1 BGE-base-en-v1.5 0.40380.8415
methodmugi llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
mugi gpt-4.1 BM25 0.36510.8216
methodmugi llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
mugi gpt-4.1 SPLADE++ 0.36250.8111
methodmugi llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
mugi gpt-4.1-nano BGE-base-en-v1.5 0.39030.8354
methodmugi llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
mugi gpt-4.1-nano BM25 0.34230.7924
methodmugi llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
mugi gpt-4.1-nano SPLADE++ 0.32540.8105
methodmugi llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.
qa_expand Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.34850.8498
methodqa_expand llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
qa_expand Qwen2.5-72B-Instruct BM25 0.32150.7876
methodqa_expand llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
qa_expand Qwen2.5-72B-Instruct SPLADE++ 0.33470.8285
methodqa_expand llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
qa_expand Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.34180.8267
methodqa_expand llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
qa_expand Qwen2.5-7B-Instruct BM25 0.28920.7746
methodqa_expand llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
qa_expand Qwen2.5-7B-Instruct SPLADE++ 0.31430.8305
methodqa_expand llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
qa_expand gpt-4.1 BGE-base-en-v1.5 0.37390.8543
methodqa_expand llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
qa_expand gpt-4.1 BM25 0.30180.7570
methodqa_expand llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
qa_expand gpt-4.1 SPLADE++ 0.35520.8034
methodqa_expand llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
qa_expand gpt-4.1-nano BGE-base-en-v1.5 0.36880.8113
methodqa_expand llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
qa_expand gpt-4.1-nano BM25 0.34690.7480
methodqa_expand llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
qa_expand gpt-4.1-nano SPLADE++ 0.37020.8506
methodqa_expand llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (FS) Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.38450.8568
methodQ2D (FS) llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (ZS) Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.39540.8508
methodQ2D (ZS) llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (COT) Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.34980.8236
methodQ2D (COT) llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (ZS) Qwen2.5-72B-Instruct BM25 0.35060.8002
methodQ2D (ZS) llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
Q2D (COT) Qwen2.5-72B-Instruct BM25 0.30750.7526
methodQ2D (COT) llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
Q2D (FS) Qwen2.5-72B-Instruct BM25 0.34670.8020
methodQ2D (FS) llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
Q2D (FS) Qwen2.5-72B-Instruct SPLADE++ 0.33330.8206
methodQ2D (FS) llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (ZS) Qwen2.5-72B-Instruct SPLADE++ 0.32000.8248
methodQ2D (ZS) llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (COT) Qwen2.5-72B-Instruct SPLADE++ 0.30160.8393
methodQ2D (COT) llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (COT) Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.33910.8300
methodQ2D (COT) llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (FS) Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.36280.8348
methodQ2D (FS) llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (ZS) Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.36750.8255
methodQ2D (ZS) llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (COT) Qwen2.5-7B-Instruct BM25 0.30440.7815
methodQ2D (COT) llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
Q2D (FS) Qwen2.5-7B-Instruct BM25 0.31410.7724
methodQ2D (FS) llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
Q2D (ZS) Qwen2.5-7B-Instruct BM25 0.33520.7763
methodQ2D (ZS) llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
Q2D (COT) Qwen2.5-7B-Instruct SPLADE++ 0.27310.8239
methodQ2D (COT) llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (FS) Qwen2.5-7B-Instruct SPLADE++ 0.26720.8116
methodQ2D (FS) llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (ZS) Qwen2.5-7B-Instruct SPLADE++ 0.29040.8006
methodQ2D (ZS) llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (ZS) gpt-4.1 BGE-base-en-v1.5 0.37860.8591
methodQ2D (ZS) llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (COT) gpt-4.1 BGE-base-en-v1.5 0.37550.8505
methodQ2D (COT) llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (FS) gpt-4.1 BGE-base-en-v1.5 0.40740.8726
methodQ2D (FS) llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (ZS) gpt-4.1 BM25 0.35020.7811
methodQ2D (ZS) llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
Q2D (COT) gpt-4.1 BM25 0.32910.7737
methodQ2D (COT) llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
Q2D (FS) gpt-4.1 BM25 0.35620.8042
methodQ2D (FS) llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
Q2D (ZS) gpt-4.1 SPLADE++ 0.33770.8389
methodQ2D (ZS) llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (COT) gpt-4.1 SPLADE++ 0.33080.8456
methodQ2D (COT) llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (FS) gpt-4.1 SPLADE++ 0.37710.8396
methodQ2D (FS) llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (COT) gpt-4.1-nano BGE-base-en-v1.5 0.37220.8367
methodQ2D (COT) llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (FS) gpt-4.1-nano BGE-base-en-v1.5 0.34800.8374
methodQ2D (FS) llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (ZS) gpt-4.1-nano BGE-base-en-v1.5 0.36830.8395
methodQ2D (ZS) llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
Q2D (COT) gpt-4.1-nano BM25 0.33200.7655
methodQ2D (COT) llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
Q2D (FS) gpt-4.1-nano BM25 0.33580.7627
methodQ2D (FS) llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
Q2D (ZS) gpt-4.1-nano BM25 0.33680.7832
methodQ2D (ZS) llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
Q2D (COT) gpt-4.1-nano SPLADE++ 0.34260.8390
methodQ2D (COT) llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (FS) gpt-4.1-nano SPLADE++ 0.35330.8005
methodQ2D (FS) llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.
Q2D (ZS) gpt-4.1-nano SPLADE++ 0.34790.8092
methodQ2D (ZS) llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.
query2e Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.37440.8503
methodquery2e llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
query2e Qwen2.5-72B-Instruct BM25 0.31480.7605
methodquery2e llmQwen2.5-72B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
query2e Qwen2.5-72B-Instruct SPLADE++ 0.34420.8328
methodquery2e llmQwen2.5-72B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
query2e Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.35210.8171
methodquery2e llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
query2e Qwen2.5-7B-Instruct BM25 0.31010.7432
methodquery2e llmQwen2.5-7B-Instruct retrieverBM25 datasetDL-HARD
No run config available.
query2e Qwen2.5-7B-Instruct SPLADE++ 0.30560.7882
methodquery2e llmQwen2.5-7B-Instruct retrieverSPLADE++ datasetDL-HARD
No run config available.
query2e gpt-4.1 BGE-base-en-v1.5 0.37790.8306
methodquery2e llmgpt-4.1 retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
query2e gpt-4.1 BM25 0.34460.7639
methodquery2e llmgpt-4.1 retrieverBM25 datasetDL-HARD
No run config available.
query2e gpt-4.1 SPLADE++ 0.35180.8380
methodquery2e llmgpt-4.1 retrieverSPLADE++ datasetDL-HARD
No run config available.
query2e gpt-4.1-nano BGE-base-en-v1.5 0.36090.8321
methodquery2e llmgpt-4.1-nano retrieverBGE-base-en-v1.5 datasetDL-HARD
No run config available.
query2e gpt-4.1-nano BM25 0.31010.7665
methodquery2e llmgpt-4.1-nano retrieverBM25 datasetDL-HARD
No run config available.
query2e gpt-4.1-nano SPLADE++ 0.32970.8143
methodquery2e llmgpt-4.1-nano retrieverSPLADE++ datasetDL-HARD
No run config available.