QueryGym
QueryGym Leaderboard
Reproducible benchmarks for LLM query reformulation.
← Models

gpt-4.1-nano

All results produced by QueryGym · fully reproducible!

30 method × retriever configurations using this LLM across BEIR, MS MARCO DL, and DL-HARD.
Click any row or the + button to expand. Tabs switch dataset context. The three steps (reformulate → retrieve → evaluate) update accordingly.

Retriever
Method
Datasets
Metric
30 / 30 configs
best in column
Method Retriever ArguAnaDBPediaFiQASciFactCOVIDNewsBRIGHT — AOPSBRIGHT — BiologyBRIGHT — Earth ScienceBRIGHT — EconomicsBRIGHT — LeetCodeBRIGHT — PonyBRIGHT — PsychologyBRIGHT — RoboticsBRIGHT — Stack OverflowBRIGHT — Sustainable LivingBRIGHT — TheoremQA QuestionsBRIGHT — TheoremQA TheoremsDL-HARDDL 2019DL 2020
nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@1k nDCG@10 R@1k nDCG@10 R@1k
csqe BGE-base-en-v1.5 0.6210 0.9886 0.4147 0.5123 0.4112 0.7489 0.7583 0.9600 0.8174 0.1442 0.4351 0.4753 0.3516 0.8371 0.7304 0.8749 0.6873 0.8535
methodcsqe llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe BM25 0.3964 0.9381 0.3647 0.4939 0.2401 0.5553 0.7099 0.9587 0.6171 0.1543 0.4271 0.5221 0.2436 0.7327 0.5410 0.8221 0.5142 0.8586
methodcsqe llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe SPLADE++ 0.3792 0.9801 0.3805 0.5235 0.3256 0.6702 0.7055 0.9533 0.6313 0.1132 0.4193 0.4601 0.2789 0.7872 0.6134 0.8900 0.5883 0.9119
methodcsqe llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr BGE-base-en-v1.5 0.6234 0.9900 0.3434 0.4680 0.3721 0.7175 0.7553 0.9633 0.7987 0.1440 0.4548 0.5134 0.3586 0.8389 0.6587 0.8493 0.6568 0.8485
methodgenqr llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr BM25 0.4013 0.9488 0.2591 0.4137 0.1974 0.5142 0.7011 0.9566 0.6662 0.1561 0.4251 0.5834 0.1743 0.6575 0.4389 0.7360 0.4302 0.7701
methodgenqr llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr SPLADE++ 0.3773 0.9829 0.3592 0.5267 0.3025 0.6466 0.7184 0.9633 0.6594 0.1163 0.4093 0.4933 0.3043 0.8408 0.6351 0.9162 0.6011 0.9074
methodgenqr llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble BGE-base-en-v1.5 0.6196 0.9900 0.3488 0.4758 0.3766 0.7298 0.7469 0.9633 0.7976 0.1425 0.4719 0.5175 0.3579 0.8282 0.6883 0.8711 0.6645 0.8620
methodgenqr_ensemble llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble BM25 0.3945 0.9474 0.3181 0.4501 0.1972 0.5205 0.7034 0.9626 0.6884 0.1690 0.4349 0.6199 0.2154 0.6990 0.4579 0.8217 0.4718 0.8158
methodgenqr_ensemble llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble SPLADE++ 0.3818 0.9808 0.3611 0.5276 0.2891 0.6311 0.7158 0.9560 0.6514 0.1166 0.4198 0.4906 0.3233 0.8400 0.6617 0.9104 0.6044 0.9194
methodgenqr_ensemble llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer BGE-base-en-v1.5 0.6254 0.9900 0.3827 0.4804 0.4009 0.7310 0.7507 0.9593 0.8007 0.1340 0.4060 0.4264 0.3759 0.8352 0.7265 0.8894 0.7135 0.8846
methodlamer llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer BM25 0.4037 0.9388 0.3440 0.4807 0.2360 0.5449 0.7220 0.9393 0.6721 0.1748 0.4328 0.5575 0.3398 0.7697 0.6731 0.8548 0.6560 0.8865
methodlamer llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer SPLADE++ 0.3800 0.9780 0.3316 0.4680 0.3014 0.6543 0.7207 0.9443 0.6285 0.1143 0.4012 0.4661 0.3459 0.7969 0.6916 0.8975 0.6254 0.9244
methodlamer llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi BGE-base-en-v1.5 0.6184 0.9900 0.4280 0.5284 0.4228 0.7488 0.7457 0.9800 0.7980 0.1425 0.4696 0.5081 0.3903 0.8354 0.7169 0.8725 0.7187 0.8911
methodmugi llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi BM25 0.3831 0.9317 0.4085 0.5161 0.2517 0.5802 0.7318 0.9627 0.7062 0.1713 0.4707 0.5873 0.3423 0.7924 0.6835 0.8915 0.6473 0.9017
methodmugi llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi SPLADE++ 0.3718 0.9787 0.3843 0.5095 0.3171 0.6673 0.6900 0.9527 0.6317 0.1144 0.4072 0.4770 0.3254 0.8105 0.6611 0.8904 0.6432 0.9203
methodmugi llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand BGE-base-en-v1.5 0.6213 0.9893 0.3718 0.4717 0.3940 0.7272 0.7486 0.9593 0.7489 0.1355 0.4271 0.4749 0.3688 0.8113 0.6523 0.8486 0.6612 0.8397
methodqa_expand llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand BM25 0.4021 0.9367 0.3680 0.4808 0.2509 0.5744 0.7059 0.9430 0.6885 0.1583 0.4326 0.5487 0.3469 0.7480 0.5819 0.8385 0.6026 0.8649
methodqa_expand llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand SPLADE++ 0.3811 0.9787 0.4019 0.5396 0.3360 0.6669 0.6939 0.9420 0.7079 0.1215 0.4227 0.4696 0.3702 0.8506 0.6883 0.9010 0.6628 0.9279
methodqa_expand llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) BGE-base-en-v1.5 0.6194 0.9893 0.3843 0.4891 0.3967 0.7409 0.7499 0.9633 0.7995 0.1420 0.4312 0.4754 0.3722 0.8367 0.6710 0.8530 0.6744 0.8709
methodQ2D (COT) llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) BGE-base-en-v1.5 0.6188 0.9900 0.4026 0.5104 0.4039 0.7311 0.7417 0.9567 0.7793 0.1402 0.4539 0.4763 0.3480 0.8374 0.7157 0.8601 0.6988 0.8742
methodQ2D (FS) llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) BGE-base-en-v1.5 0.6190 0.9900 0.4268 0.5239 0.4155 0.7412 0.7541 0.9633 0.8019 0.1417 0.4467 0.4931 0.3683 0.8395 0.7202 0.8701 0.7029 0.8743
methodQ2D (ZS) llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) BM25 0.4011 0.9360 0.3921 0.5132 0.2557 0.5758 0.7273 0.9560 0.7503 0.1744 0.4601 0.5728 0.3320 0.7655 0.6254 0.8621 0.6092 0.8846
methodQ2D (COT) llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) BM25 0.3965 0.9324 0.3720 0.4873 0.2531 0.5833 0.7053 0.9410 0.6827 0.1634 0.4442 0.5398 0.3358 0.7627 0.6643 0.8527 0.6227 0.8848
methodQ2D (FS) llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) BM25 0.3980 0.9374 0.3968 0.4980 0.2548 0.5899 0.7170 0.9403 0.6967 0.1656 0.4685 0.5564 0.3368 0.7832 0.6779 0.8862 0.6268 0.8869
methodQ2D (ZS) llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) SPLADE++ 0.3820 0.9801 0.3962 0.5324 0.3131 0.6532 0.7065 0.9433 0.6809 0.1163 0.4053 0.4554 0.3426 0.8390 0.6544 0.8954 0.6271 0.9167
methodQ2D (COT) llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) SPLADE++ 0.3823 0.9801 0.3790 0.5204 0.3390 0.6636 0.7121 0.9400 0.6715 0.1182 0.4292 0.4573 0.3533 0.8005 0.6318 0.8839 0.6471 0.9232
methodQ2D (FS) llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) SPLADE++ 0.3819 0.9808 0.3849 0.5064 0.3335 0.6640 23.0000 0.9493 0.6645 0.1146 0.4055 0.4651 0.3479 0.8092 0.6877 0.8916 0.6242 0.9219
methodQ2D (ZS) llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e BGE-base-en-v1.5 0.6198 0.9900 0.3558 0.4657 0.3816 0.7261 0.7477 0.9633 0.7803 0.1407 0.4504 0.5018 0.3609 0.8321 0.6802 0.8662 0.6706 0.8514
methodquery2e llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e BM25 0.4060 0.9417 0.3597 0.4696 0.2524 0.5779 0.7016 0.9480 0.7373 0.1765 0.4557 0.5827 0.3101 0.7665 0.5891 0.8474 0.5475 0.8392
methodquery2e llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e SPLADE++ 0.3819 0.9808 0.3716 0.5295 0.3113 0.6493 0.7206 0.9387 0.6747 0.1214 0.4086 0.4906 0.3297 0.8143 0.6320 0.9104 0.6605 0.9142
methodquery2e llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt