QueryGym
QueryGym Leaderboard
Reproducible benchmarks for LLM query reformulation.
← Methods

query2e

query2e
All results produced by QueryGym · fully reproducible!

12 model × retriever configurations for this method across BEIR, MS MARCO DL, and DL-HARD.
Click any row or the + button to expand. Tabs switch dataset context. The three steps (reformulate → retrieve → evaluate) update accordingly.

Retriever
Model
Datasets
Metric
12 / 12 configs
best in column
Model Retriever ArguAnaDBPediaFiQASciFactCOVIDNewsBRIGHT — AOPSBRIGHT — BiologyBRIGHT — Earth ScienceBRIGHT — EconomicsBRIGHT — LeetCodeBRIGHT — PonyBRIGHT — PsychologyBRIGHT — RoboticsBRIGHT — Stack OverflowBRIGHT — Sustainable LivingBRIGHT — TheoremQA QuestionsBRIGHT — TheoremQA TheoremsDL-HARDDL 2019DL 2020
nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@1k nDCG@10 R@1k nDCG@10 R@1k
Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6196 0.9900 0.3610 0.4706 0.3793 0.7222 0.7382 0.9567 0.7857 0.1412 0.4509 0.5067 0.3744 0.8503 0.7069 0.8760 0.6606 0.8528
methodquery2e llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Qwen2.5-72B-Instruct BM25 0.4066 0.3578 0.4641 0.2518 0.6969 0.6942 0.1611 0.4484 0.5647 0.3148 0.7605 0.5845 0.8501 0.5546 0.8609
methodquery2e llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Qwen2.5-72B-Instruct SPLADE++ 0.5188 0.9808 0.3755 0.5195 0.3036 0.6438 0.7049 0.9427 0.6196 0.1201 0.4076 0.4799 0.3442 0.8328 0.6686 0.9104 0.6353 0.9286
methodquery2e llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6205 0.9900 0.3415 0.4534 0.3795 0.7132 0.7378 0.9633 0.7618 0.1379 0.4454 0.4967 0.3521 0.8171 0.6646 0.8422 0.6425 0.8443
methodquery2e llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Qwen2.5-7B-Instruct BM25 0.4052 0.9403 0.3477 0.4691 0.2453 0.5494 0.6967 0.9520 0.6945 0.1653 0.4503 0.5824 0.3101 0.7432 0.5721 0.8431 0.5404 0.8548
methodquery2e llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Qwen2.5-7B-Instruct SPLADE++ 0.5193 0.9815 0.3386 0.5134 0.2912 0.6256 0.7115 0.9493 0.6080 0.1093 0.4073 0.4888 0.3056 0.7882 0.5474 0.8734 0.5312 0.9001
methodquery2e llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
gpt-4.1 BGE-base-en-v1.5 0.6192 0.9900 0.3249 0.4268 0.3920 0.7411 0.7417 0.9633 0.7741 0.1404 0.4448 0.4848 0.3779 0.8306 0.6970 0.8701 0.6422 0.8184
methodquery2e llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
gpt-4.1 BM25 0.4062 0.9381 0.3778 0.4772 0.2690 0.5930 0.7089 0.9403 0.7150 0.1772 0.4633 0.5807 0.3446 0.7639 0.5935 0.8698 0.5759 0.8594
methodquery2e llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
gpt-4.1 SPLADE++ 0.3818 0.9808 0.3936 0.5477 0.3282 0.6670 0.7187 0.9393 0.6869 0.1222 0.4206 0.4992 0.3518 0.8380 0.6812 0.9302 0.6522 0.9252
methodquery2e llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
gpt-4.1-nano BGE-base-en-v1.5 0.6198 0.9900 0.3558 0.4657 0.3816 0.7261 0.7477 0.9633 0.7803 0.1407 0.4504 0.5018 0.3609 0.8321 0.6802 0.8662 0.6706 0.8514
methodquery2e llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
gpt-4.1-nano BM25 0.4060 0.9417 0.3597 0.4696 0.2524 0.5779 0.7016 0.9480 0.7373 0.1765 0.4557 0.5827 0.3101 0.7665 0.5891 0.8474 0.5475 0.8392
methodquery2e llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
gpt-4.1-nano SPLADE++ 0.3819 0.9808 0.3716 0.5295 0.3113 0.6493 0.7206 0.9387 0.6747 0.1214 0.4086 0.4906 0.3297 0.8143 0.6320 0.9104 0.6605 0.9142
methodquery2e llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt