QueryGym
QueryGym Leaderboard
Reproducible benchmarks for LLM query reformulation.

Main Results

All results produced by QueryGym · fully reproducible!

Query reformulation methods × LLMs × retrievers benchmarked across BEIR, MS MARCO DL, and DL-HARD.
Click any row or the + button to expand. Tabs switch dataset context. The three steps (reformulate → retrieve → evaluate) update accordingly.

Retriever
Model
Method
Datasets
Metric
120 / 120 configs · 1080 runs
best in column
Method LLM Retriever ArguAnaDBPediaFiQASciFactCOVIDNewsBRIGHT — AOPSBRIGHT — BiologyBRIGHT — Earth ScienceBRIGHT — EconomicsBRIGHT — LeetCodeBRIGHT — PonyBRIGHT — PsychologyBRIGHT — RoboticsBRIGHT — Stack OverflowBRIGHT — Sustainable LivingBRIGHT — TheoremQA QuestionsBRIGHT — TheoremQA TheoremsDL-HARDDL 2019DL 2020
nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@100 nDCG@10 R@1k nDCG@10 R@1k nDCG@10 R@1k
csqe gpt-4.1 BGE-base-en-v1.5 0.6218 0.9915 0.4242 0.5229 0.4067 0.7384 0.7553 0.9633 0.7879 0.1431 0.4631 0.5075 0.4144 0.8640 0.7551 0.9009 0.7139 0.8968
methodcsqe llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe gpt-4.1 BM25 0.3977 0.9445 0.3899 0.5136 0.2473 0.5835 0.7206 0.9487 0.6994 0.1638 0.4790 0.5909 0.3658 0.7873 0.6899 0.9035 0.6548 0.8871
methodcsqe llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe gpt-4.1 SPLADE++ 0.3801 0.9829 0.3962 0.5232 0.3294 0.6748 0.7065 0.9593 0.6811 0.1116 0.4502 0.5018 0.3690 0.8341 0.6936 0.9193 0.6796 0.9397
methodcsqe llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe gpt-4.1-nano BGE-base-en-v1.5 0.6210 0.9886 0.4147 0.5123 0.4112 0.7489 0.7583 0.9600 0.8174 0.1442 0.4351 0.4753 0.3516 0.8371 0.7304 0.8749 0.6873 0.8535
methodcsqe llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe gpt-4.1-nano BM25 0.3964 0.9381 0.3647 0.4939 0.2401 0.5553 0.7099 0.9587 0.6171 0.1543 0.4271 0.5221 0.2436 0.7327 0.5410 0.8221 0.5142 0.8586
methodcsqe llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe gpt-4.1-nano SPLADE++ 0.3792 0.9801 0.3805 0.5235 0.3256 0.6702 0.7055 0.9533 0.6313 0.1132 0.4193 0.4601 0.2789 0.7872 0.6134 0.8900 0.5883 0.9119
methodcsqe llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6229 0.9886 0.4024 0.4897 0.3796 0.7461 0.7484 0.9667 0.7793 0.1410 0.4626 0.4812 0.3757 0.8531 0.7179 0.8944 0.6687 0.8722
methodcsqe llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe Qwen2.5-72B-Instruct BM25 0.3864 0.3556 0.4639 0.2132 0.7141 0.6716 0.1491 0.3861 0.4892 0.2848 0.6998 0.6391 0.8608 0.5606 0.8603
methodcsqe llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe Qwen2.5-72B-Instruct SPLADE++ 0.5118 0.9787 0.3686 0.5021 0.3075 0.6521 0.6966 0.9433 0.6118 0.1082 0.3871 0.4548 0.2857 0.8246 0.6189 0.9070 0.5736 0.9052
methodcsqe llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6231 0.9893 0.3826 0.4879 0.3939 0.7437 0.7415 0.9727 0.7862 0.1449 0.4360 0.5126 0.3671 0.8348 0.7127 0.8803 0.6885 0.8850
methodcsqe llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe Qwen2.5-7B-Instruct BM25 0.4008 0.9403 0.3767 0.5078 0.2200 0.5466 0.7183 0.9543 0.6757 0.1600 0.4504 0.5795 0.3322 0.7913 0.6873 0.8921 0.6083 0.8596
methodcsqe llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
csqe Qwen2.5-7B-Instruct SPLADE++ 0.5100 0.9801 0.3661 0.4830 0.3035 0.6521 0.6765 0.9527 0.6096 0.1024 0.4079 0.4866 0.3025 0.8057 0.6523 0.9089 0.6164 0.9039
methodcsqe llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method csqe \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr gpt-4.1 BGE-base-en-v1.5 0.6256 0.9893 0.3555 0.4693 0.3924 0.7330 0.7480 0.9700 0.7784 0.1475 0.4641 0.5089 0.3870 0.8402 0.7023 0.8650 0.6903 0.8516
methodgenqr llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr gpt-4.1 BM25 0.4060 0.9495 0.3442 0.4635 0.2302 0.5818 0.7262 0.9632 0.6869 0.1627 0.4647 0.6096 0.2921 0.7434 0.5479 0.8282 0.5368 0.8402
methodgenqr llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr gpt-4.1 SPLADE++ 0.3755 0.9836 0.3827 0.5414 0.3243 0.6774 0.7277 0.9500 0.6820 0.1193 0.4256 0.4877 0.3800 0.8488 0.7065 0.9333 0.6260 0.9143
methodgenqr llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr gpt-4.1-nano BGE-base-en-v1.5 0.6234 0.9900 0.3434 0.4680 0.3721 0.7175 0.7553 0.9633 0.7987 0.1440 0.4548 0.5134 0.3586 0.8389 0.6587 0.8493 0.6568 0.8485
methodgenqr llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr gpt-4.1-nano BM25 0.4013 0.9488 0.2591 0.4137 0.1974 0.5142 0.7011 0.9566 0.6662 0.1561 0.4251 0.5834 0.1743 0.6575 0.4389 0.7360 0.4302 0.7701
methodgenqr llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr gpt-4.1-nano SPLADE++ 0.3773 0.9829 0.3592 0.5267 0.3025 0.6466 0.7184 0.9633 0.6594 0.1163 0.4093 0.4933 0.3043 0.8408 0.6351 0.9162 0.6011 0.9074
methodgenqr llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6248 0.9900 0.3692 0.4808 0.3826 0.7139 0.7339 0.9650 0.7869 0.1416 0.4409 0.5023 0.3471 0.8144 0.6741 0.8618 0.6680 0.8652
methodgenqr llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr Qwen2.5-72B-Instruct BM25 0.4188 0.2649 0.3941 0.1725 0.6976 0.6129 0.1349 0.4003 0.5838 0.2091 0.6822 0.4198 0.7616 0.4238 0.7919
methodgenqr llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr Qwen2.5-72B-Instruct SPLADE++ 0.5201 0.9815 0.3579 0.5275 0.2868 0.6217 0.7468 0.9413 0.6292 0.1055 0.3808 0.4754 0.2916 0.7861 0.6154 0.9030 0.5751 0.8971
methodgenqr llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6262 0.9893 0.3426 0.4550 0.3716 0.7167 0.7254 0.9600 0.7608 0.1382 0.4526 0.4886 0.3375 0.8235 0.6416 0.8381 0.6335 0.8395
methodgenqr llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr Qwen2.5-7B-Instruct BM25 0.4339 0.9523 0.2876 0.4203 0.2041 0.5057 0.6919 0.9413 0.6523 0.1522 0.4295 0.5580 0.2006 0.6458 0.4334 0.7860 0.3857 0.7740
methodgenqr llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"variants","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"variants","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"variants","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr Qwen2.5-7B-Instruct SPLADE++ 0.5211 0.9851 0.3703 0.5386 0.3057 0.6309 0.6942 0.9297 0.7060 0.1263 0.3950 0.4527 0.3386 0.8000 0.6449 0.8870 0.6115 0.8989
methodgenqr llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble gpt-4.1 BGE-base-en-v1.5 0.6187 0.9900 0.3759 0.4961 0.4029 0.7456 0.7589 0.9700 0.7999 0.1443 0.4748 0.5249 0.3572 0.8633 0.7034 0.8870 0.6826 0.8699
methodgenqr_ensemble llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble gpt-4.1 BM25 0.4073 0.9566 0.3600 0.4765 0.2388 0.5804 0.7251 0.9666 0.7528 0.1839 0.4860 0.6293 0.2697 0.7775 0.5589 0.8685 0.5528 0.8613
methodgenqr_ensemble llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble gpt-4.1 SPLADE++ 0.3806 0.9808 0.3643 0.5365 0.3014 0.6536 0.7175 0.9433 0.6731 0.1198 0.4438 0.5053 0.3047 0.8207 0.6859 0.9020 0.5857 0.9141
methodgenqr_ensemble llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble gpt-4.1-nano BGE-base-en-v1.5 0.6196 0.9900 0.3488 0.4758 0.3766 0.7298 0.7469 0.9633 0.7976 0.1425 0.4719 0.5175 0.3579 0.8282 0.6883 0.8711 0.6645 0.8620
methodgenqr_ensemble llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble gpt-4.1-nano BM25 0.3945 0.9474 0.3181 0.4501 0.1972 0.5205 0.7034 0.9626 0.6884 0.1690 0.4349 0.6199 0.2154 0.6990 0.4579 0.8217 0.4718 0.8158
methodgenqr_ensemble llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble gpt-4.1-nano SPLADE++ 0.3818 0.9808 0.3611 0.5276 0.2891 0.6311 0.7158 0.9560 0.6514 0.1166 0.4198 0.4906 0.3233 0.8400 0.6617 0.9104 0.6044 0.9194
methodgenqr_ensemble llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6254 0.9893 0.3974 0.5309 0.3943 0.7284 0.7496 0.9700 0.7915 0.1407 0.4515 0.5136 0.3543 0.8269 0.6819 0.8825 0.6774 0.8585
methodgenqr_ensemble llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble Qwen2.5-72B-Instruct BM25 0.4080 0.3136 0.4161 0.2061 0.7089 0.6437 0.1451 0.4080 0.5923 0.2463 0.6975 0.4739 0.7999 0.4248 0.7820
methodgenqr_ensemble llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble Qwen2.5-72B-Instruct SPLADE++ 0.5193 0.9822 0.4271 0.5565 0.3062 0.6136 0.7135 0.9433 0.6162 0.1099 0.3963 0.5087 0.2849 0.7823 0.5979 0.9053 0.5447 0.8886
methodgenqr_ensemble llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6196 0.9900 0.3462 0.4644 0.3792 0.7180 0.7375 0.9667 0.7754 0.1379 0.4589 0.5172 0.3713 0.8356 0.6661 0.8520 0.6700 0.8582
methodgenqr_ensemble llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble Qwen2.5-7B-Instruct BM25 0.4187 0.9566 0.3464 0.4916 0.2075 0.5114 0.7035 0.9476 0.6780 0.1745 0.4367 0.6031 0.2429 0.7210 0.4512 0.7952 0.4896 0.8164
methodgenqr_ensemble llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"variants","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"variants","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
genqr_ensemble Qwen2.5-7B-Instruct SPLADE++ 0.5180 0.9815 0.3589 0.5194 0.2882 0.6249 0.6964 0.9460 0.6420 0.1117 0.4049 0.4814 0.3292 0.8005 0.5948 0.8824 0.6307 0.9020
methodgenqr_ensemble llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method genqr_ensemble \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer gpt-4.1 BGE-base-en-v1.5 0.6204 0.9893 0.4018 0.4998 0.4080 0.7410 0.7572 0.9733 0.7796 0.1373 0.4367 0.4591 0.4120 0.8557 0.7032 0.8888 0.7148 0.9026
methodlamer llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer gpt-4.1 BM25 0.4119 0.9452 0.3989 0.5159 0.2616 0.5901 0.7253 0.9487 0.7020 0.1661 0.4799 0.5960 0.3555 0.8065 0.6368 0.8566 0.6530 0.9002
methodlamer llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer gpt-4.1 SPLADE++ 0.3836 0.9829 0.3559 0.4904 0.3292 0.6724 0.7182 0.9577 0.6312 0.1081 0.4520 0.4770 0.3673 0.8246 0.6836 0.9065 0.6390 0.9378
methodlamer llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer gpt-4.1-nano BGE-base-en-v1.5 0.6254 0.9900 0.3827 0.4804 0.4009 0.7310 0.7507 0.9593 0.8007 0.1340 0.4060 0.4264 0.3759 0.8352 0.7265 0.8894 0.7135 0.8846
methodlamer llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer gpt-4.1-nano BM25 0.4037 0.9388 0.3440 0.4807 0.2360 0.5449 0.7220 0.9393 0.6721 0.1748 0.4328 0.5575 0.3398 0.7697 0.6731 0.8548 0.6560 0.8865
methodlamer llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer gpt-4.1-nano SPLADE++ 0.3800 0.9780 0.3316 0.4680 0.3014 0.6543 0.7207 0.9443 0.6285 0.1143 0.4012 0.4661 0.3459 0.7969 0.6916 0.8975 0.6254 0.9244
methodlamer llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6210 0.9893 0.4139 0.5001 0.4096 0.7483 0.7524 0.9800 0.7941 0.1401 0.4512 0.4936 0.4055 0.8453 0.7219 0.8859 0.7276 0.9045
methodlamer llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer Qwen2.5-72B-Instruct BM25 0.4111 0.4010 0.5217 0.2395 0.7251 0.7240 0.1667 0.4677 0.6105 0.3635 0.7820 0.6651 0.8666 0.6711 0.8920
methodlamer llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer Qwen2.5-72B-Instruct SPLADE++ 0.5161 0.9815 0.3697 0.4883 0.3041 0.6516 0.7046 0.9600 0.6543 0.1057 0.4161 0.4850 0.3648 0.8156 0.6651 0.8956 0.6483 0.9195
methodlamer llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6195 0.9908 0.3900 0.4838 0.3981 0.7318 0.7466 0.9733 0.7843 0.1360 0.4517 0.4753 0.3788 0.8315 0.7113 0.8668 0.6825 0.8940
methodlamer llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer Qwen2.5-7B-Instruct BM25 0.4063 0.9388 0.3896 0.5139 0.2337 0.5558 0.7140 0.9593 0.6955 0.1704 0.4424 0.5960 0.3570 0.7633 0.6602 0.8553 0.6322 0.8933
methodlamer llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
lamer Qwen2.5-7B-Instruct SPLADE++ 0.5148 0.9794 0.3499 0.4799 0.2944 0.6487 0.6651 0.9560 0.6339 0.1002 0.3967 0.4728 0.3280 0.7917 0.6465 0.8654 0.6076 0.9213
methodlamer llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method lamer \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi gpt-4.1 BGE-base-en-v1.5 0.6161 0.9900 0.4400 0.5286 0.4294 0.7584 0.7569 0.9767 0.8024 0.1427 0.4898 0.5212 0.4038 0.8415 0.7351 0.8869 0.7203 0.8950
methodmugi llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi gpt-4.1 BM25 0.3758 0.9331 0.4099 0.5309 0.2641 0.6000 0.7345 0.9660 0.7137 0.1739 0.5156 0.6075 0.3651 0.8216 0.6952 0.9005 0.6578 0.8996
methodmugi llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi gpt-4.1 SPLADE++ 0.3703 0.9780 0.3843 0.5137 0.3352 0.6799 0.7059 0.9600 0.6458 0.1118 0.4422 0.5002 0.3625 0.8111 0.6859 0.9088 0.6508 0.9199
methodmugi llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi gpt-4.1-nano BGE-base-en-v1.5 0.6184 0.9900 0.4280 0.5284 0.4228 0.7488 0.7457 0.9800 0.7980 0.1425 0.4696 0.5081 0.3903 0.8354 0.7169 0.8725 0.7187 0.8911
methodmugi llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi gpt-4.1-nano BM25 0.3831 0.9317 0.4085 0.5161 0.2517 0.5802 0.7318 0.9627 0.7062 0.1713 0.4707 0.5873 0.3423 0.7924 0.6835 0.8915 0.6473 0.9017
methodmugi llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi gpt-4.1-nano SPLADE++ 0.3718 0.9787 0.3843 0.5095 0.3171 0.6673 0.6900 0.9527 0.6317 0.1144 0.4072 0.4770 0.3254 0.8105 0.6611 0.8904 0.6432 0.9203
methodmugi llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6194 0.9900 0.4342 0.5318 0.4192 0.7526 0.7453 0.9700 0.7972 0.1425 0.4732 0.5298 0.3948 0.8548 0.7512 0.9071 0.7122 0.8894
methodmugi llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi Qwen2.5-72B-Instruct BM25 0.3868 0.4103 0.5296 0.2435 0.7203 0.6927 0.1694 0.5009 0.5921 0.3609 0.8122 0.6911 0.9055 0.6268 0.9015
methodmugi llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi Qwen2.5-72B-Instruct SPLADE++ 0.5031 0.9787 0.3735 0.5044 0.3023 0.6787 0.6951 0.9493 0.6639 0.1105 0.4394 0.4972 0.3260 0.8098 0.6746 0.9275 0.6419 0.9165
methodmugi llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6213 0.9922 0.4106 0.5195 0.4130 0.7456 0.7449 0.9767 0.8071 0.1406 0.4648 0.5142 0.3619 0.8495 0.6869 0.8781 0.6888 0.8823
methodmugi llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi Qwen2.5-7B-Instruct BM25 0.3926 0.9381 0.4006 0.5114 0.2368 0.5652 0.7063 0.9627 0.6771 0.1628 0.4436 0.5767 0.3173 0.7707 0.6394 0.8732 0.6069 0.8882
methodmugi llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
mugi Qwen2.5-7B-Instruct SPLADE++ 0.5101 0.9787 0.3600 0.4989 0.2953 0.6597 0.6665 0.9593 0.6547 0.1045 0.4001 0.4725 0.2642 0.8028 0.5773 0.8929 0.5527 0.9104
methodmugi llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method mugi \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand gpt-4.1 BGE-base-en-v1.5 0.6231 0.9900 0.4005 0.5087 0.4162 0.7452 0.7367 0.9600 0.7954 0.1419 0.4697 0.4852 0.3739 0.8543 0.7370 0.8936 0.7074 0.8754
methodqa_expand llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand gpt-4.1 BM25 0.3970 0.9324 0.3699 0.4890 0.2643 0.5814 0.7063 0.9403 0.7065 0.1620 0.4502 0.5608 0.3018 0.7570 0.6832 0.8495 0.6418 0.8787
methodqa_expand llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand gpt-4.1 SPLADE++ 0.3823 0.9801 0.3873 0.5289 0.3399 0.6821 0.6964 0.9493 0.6941 0.1152 0.4266 0.4566 0.3552 0.8034 0.7335 0.9170 0.6739 0.9260
methodqa_expand llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand gpt-4.1-nano BGE-base-en-v1.5 0.6213 0.9893 0.3718 0.4717 0.3940 0.7272 0.7486 0.9593 0.7489 0.1355 0.4271 0.4749 0.3688 0.8113 0.6523 0.8486 0.6612 0.8397
methodqa_expand llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand gpt-4.1-nano BM25 0.4021 0.9367 0.3680 0.4808 0.2509 0.5744 0.7059 0.9430 0.6885 0.1583 0.4326 0.5487 0.3469 0.7480 0.5819 0.8385 0.6026 0.8649
methodqa_expand llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand gpt-4.1-nano SPLADE++ 0.3811 0.9787 0.4019 0.5396 0.3360 0.6669 0.6939 0.9420 0.7079 0.1215 0.4227 0.4696 0.3702 0.8506 0.6883 0.9010 0.6628 0.9279
methodqa_expand llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6213 0.9900 0.4013 0.4955 0.3891 0.7274 0.7431 0.9667 0.7775 0.1370 0.4842 0.4983 0.3485 0.8498 0.6999 0.8733 0.6916 0.8785
methodqa_expand llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand Qwen2.5-72B-Instruct BM25 0.3995 0.3744 0.4709 0.2484 0.7015 0.6809 0.1600 0.4474 0.5517 0.3215 0.7876 0.6109 0.8396 0.6152 0.8727
methodqa_expand llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand Qwen2.5-72B-Instruct SPLADE++ 0.5174 0.9794 0.3830 0.5213 0.3333 0.6464 0.6796 0.9393 0.6324 0.1079 0.4168 0.4803 0.3347 0.8285 0.6757 0.9005 0.6983 0.9284
methodqa_expand llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6208 0.9900 0.3731 0.4872 0.3837 0.7309 0.7434 0.9583 0.7668 0.1378 0.4406 0.4862 0.3418 0.8267 0.6740 0.8469 0.6541 0.8606
methodqa_expand llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand Qwen2.5-7B-Instruct BM25 0.3940 0.9324 0.3338 0.4669 0.2234 0.5488 0.6857 0.9347 0.6729 0.1569 0.4340 0.5419 0.2892 0.7746 0.5553 0.7976 0.5654 0.8454
methodqa_expand llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
qa_expand Qwen2.5-7B-Instruct SPLADE++ 0.5170 0.9829 0.3613 0.5111 0.2978 0.6387 0.6616 0.9547 0.6431 0.1103 0.3910 0.4548 0.3143 0.8305 0.6574 0.8890 0.6156 0.8945
methodqa_expand llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method qa_expand \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) gpt-4.1 BGE-base-en-v1.5 0.6186 0.9886 0.3678 0.4556 0.4009 0.7483 0.7580 0.9633 0.7984 0.1380 0.4331 0.4763 0.3755 0.8505 0.7125 0.8877 0.6720 0.8756
methodQ2D (COT) llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) gpt-4.1 BM25 0.4028 0.9374 0.3934 0.4775 0.2578 0.5843 0.7135 0.9510 0.7277 0.1696 0.4656 0.5829 0.3291 0.7737 0.6528 0.8777 0.6239 0.8781
methodQ2D (COT) llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) gpt-4.1 SPLADE++ 0.3820 0.9801 0.3926 0.5319 0.3154 0.6513 0.7120 0.9460 0.6858 0.1056 0.4160 0.4741 0.3308 0.8456 0.6877 0.9153 0.6534 0.9089
methodQ2D (COT) llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) gpt-4.1-nano BGE-base-en-v1.5 0.6194 0.9893 0.3843 0.4891 0.3967 0.7409 0.7499 0.9633 0.7995 0.1420 0.4312 0.4754 0.3722 0.8367 0.6710 0.8530 0.6744 0.8709
methodQ2D (COT) llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) gpt-4.1-nano BM25 0.4011 0.9360 0.3921 0.5132 0.2557 0.5758 0.7273 0.9560 0.7503 0.1744 0.4601 0.5728 0.3320 0.7655 0.6254 0.8621 0.6092 0.8846
methodQ2D (COT) llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) gpt-4.1-nano SPLADE++ 0.3820 0.9801 0.3962 0.5324 0.3131 0.6532 0.7065 0.9433 0.6809 0.1163 0.4053 0.4554 0.3426 0.8390 0.6544 0.8954 0.6271 0.9167
methodQ2D (COT) llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6188 0.9900 0.3528 0.4617 0.3941 0.7358 0.7387 0.9600 0.7710 0.1367 0.4070 0.4508 0.3498 0.8236 0.7121 0.8712 0.6411 0.8485
methodQ2D (COT) llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) Qwen2.5-72B-Instruct BM25 0.4060 0.3787 0.4778 0.2453 0.7077 0.6785 0.1590 0.4172 0.5578 0.3075 0.7526 0.6378 0.8508 0.5651 0.8549
methodQ2D (COT) llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) Qwen2.5-72B-Instruct SPLADE++ 0.5199 0.9808 0.3897 0.5470 0.3157 0.6411 0.6834 0.9533 0.6425 0.1159 0.4054 0.4627 0.3016 0.8393 0.6941 0.9148 0.6099 0.8857
methodQ2D (COT) llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6195 0.9893 0.3498 0.4463 0.3896 0.7244 0.7336 0.9667 0.7769 0.1386 0.4295 0.4584 0.3391 0.8300 0.6561 0.8397 0.6302 0.8573
methodQ2D (COT) llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) Qwen2.5-7B-Instruct BM25 0.4011 0.9360 0.3669 0.4809 0.2405 0.5544 0.7096 0.9427 0.6997 0.1620 0.4349 0.5616 0.3044 0.7815 0.6074 0.8585 0.5802 0.8684
methodQ2D (COT) llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (COT) Qwen2.5-7B-Instruct SPLADE++ 0.5200 0.9808 0.3697 0.5223 0.3206 0.6505 0.6825 0.9467 0.6567 0.1147 0.3831 0.4524 0.2731 0.8239 0.6513 0.9037 0.5948 0.9019
methodQ2D (COT) llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) gpt-4.1 BGE-base-en-v1.5 0.6179 0.9893 0.4302 0.5303 0.4205 0.7542 0.7519 0.9667 0.8039 0.1411 0.4715 0.5157 0.4074 0.8726 0.7272 0.8890 0.7141 0.8948
methodQ2D (FS) llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) gpt-4.1 BM25 0.4012 0.9410 0.4010 0.5083 0.2684 0.5993 0.7123 0.9493 0.7081 0.1639 0.4801 0.5842 0.3562 0.8042 0.6904 0.8861 0.6746 0.8984
methodQ2D (FS) llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) gpt-4.1 SPLADE++ 0.3826 0.9808 0.3910 0.5192 0.3446 0.6890 0.7093 0.9567 0.6591 0.1099 0.4302 0.5009 0.3771 0.8396 0.6932 0.9068 0.6749 0.9389
methodQ2D (FS) llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) gpt-4.1-nano BGE-base-en-v1.5 0.6188 0.9900 0.4026 0.5104 0.4039 0.7311 0.7417 0.9567 0.7793 0.1402 0.4539 0.4763 0.3480 0.8374 0.7157 0.8601 0.6988 0.8742
methodQ2D (FS) llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) gpt-4.1-nano BM25 0.3965 0.9324 0.3720 0.4873 0.2531 0.5833 0.7053 0.9410 0.6827 0.1634 0.4442 0.5398 0.3358 0.7627 0.6643 0.8527 0.6227 0.8848
methodQ2D (FS) llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) gpt-4.1-nano SPLADE++ 0.3823 0.9801 0.3790 0.5204 0.3390 0.6636 0.7121 0.9400 0.6715 0.1182 0.4292 0.4573 0.3533 0.8005 0.6318 0.8839 0.6471 0.9232
methodQ2D (FS) llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6190 0.9900 0.4113 0.5101 0.4098 0.7431 0.7540 0.9633 0.7891 0.1401 0.4857 0.5135 0.3845 0.8568 0.7419 0.9027 0.6792 0.8913
methodQ2D (FS) llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) Qwen2.5-72B-Instruct BM25 0.3991 0.3904 0.4993 0.2509 0.7163 0.7078 0.1673 0.4807 0.6048 0.3467 0.8020 0.6875 0.8959 0.6264 0.8907
methodQ2D (FS) llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) Qwen2.5-72B-Instruct SPLADE++ 0.5200 0.9801 0.3662 0.5023 0.3261 0.6552 0.7035 0.9567 0.6689 0.1142 0.4238 0.4846 0.3333 0.8206 0.7151 0.9124 0.6499 0.9234
methodQ2D (FS) llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6207 0.9886 0.3922 0.4865 0.3866 0.7308 0.7454 0.9567 0.7922 0.1388 0.4627 0.5133 0.3628 0.8348 0.6776 0.8535 0.6402 0.8578
methodQ2D (FS) llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) Qwen2.5-7B-Instruct BM25 0.3984 0.9353 0.3859 0.4831 0.2430 0.5533 0.7149 0.9443 0.7423 0.1668 0.4778 0.5842 0.3141 0.7724 0.5884 0.8605 0.5428 0.8691
methodQ2D (FS) llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (FS) Qwen2.5-7B-Instruct SPLADE++ 0.5199 0.9808 0.3575 0.4927 0.3079 0.6483 0.7120 0.9500 0.6793 0.1095 0.4146 0.4917 0.2672 0.8116 0.6095 0.8612 0.5492 0.9062
methodQ2D (FS) llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) gpt-4.1 BGE-base-en-v1.5 0.6187 0.9900 0.4311 0.5221 0.4151 0.7489 0.7609 0.9633 0.8061 0.1454 0.4761 0.5108 0.3786 0.8591 0.7281 0.8995 0.7393 0.9056
methodQ2D (ZS) llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) gpt-4.1 BM25 0.3970 0.9324 0.4062 0.5051 0.2599 0.6002 0.7203 0.9477 0.7430 0.1704 0.4980 0.5858 0.3502 0.7811 0.6873 0.8924 0.6625 0.8942
methodQ2D (ZS) llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) gpt-4.1 SPLADE++ 0.3819 0.9808 0.3947 0.5209 0.3301 0.6766 0.7035 0.9553 0.6340 0.1089 0.4517 0.4786 0.3377 0.8389 0.7000 0.9142 0.6875 0.9372
methodQ2D (ZS) llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) gpt-4.1-nano BGE-base-en-v1.5 0.6190 0.9900 0.4268 0.5239 0.4155 0.7412 0.7541 0.9633 0.8019 0.1417 0.4467 0.4931 0.3683 0.8395 0.7202 0.8701 0.7029 0.8743
methodQ2D (ZS) llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) gpt-4.1-nano BM25 0.3980 0.9374 0.3968 0.4980 0.2548 0.5899 0.7170 0.9403 0.6967 0.1656 0.4685 0.5564 0.3368 0.7832 0.6779 0.8862 0.6268 0.8869
methodQ2D (ZS) llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) gpt-4.1-nano SPLADE++ 0.3819 0.9808 0.3849 0.5064 0.3335 0.6640 23.0000 0.9493 0.6645 0.1146 0.4055 0.4651 0.3479 0.8092 0.6877 0.8916 0.6242 0.9219
methodQ2D (ZS) llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6187 0.9900 0.4217 0.5121 0.4060 0.7383 0.7494 0.9667 0.7712 0.1382 0.4681 0.5148 0.3954 0.8508 0.7269 0.9092 0.6982 0.8945
methodQ2D (ZS) llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) Qwen2.5-72B-Instruct BM25 0.3995 0.4034 0.5107 0.2540 0.7172 0.6973 0.1672 0.4675 0.5557 0.3506 0.8002 0.6557 0.8807 0.6207 0.8801
methodQ2D (ZS) llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) Qwen2.5-72B-Instruct SPLADE++ 0.5194 0.9808 0.3707 0.5051 0.3213 0.6469 0.6965 0.9560 0.6272 0.1095 0.4068 0.4700 0.3200 0.8248 0.6682 0.9161 0.6144 0.9161
methodQ2D (ZS) llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6183 0.9893 0.3932 0.4932 0.4011 0.7311 0.7520 0.9633 0.8220 0.1440 0.4537 0.5067 0.3675 0.8255 0.6907 0.8584 0.6617 0.8566
methodQ2D (ZS) llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) Qwen2.5-7B-Instruct BM25 0.4007 0.9353 0.3836 0.5047 0.2460 0.5597 0.7042 0.9443 0.7071 0.1628 0.4507 0.5561 0.3352 0.7763 0.6014 0.8467 0.5685 0.8647
methodQ2D (ZS) llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
Q2D (ZS) Qwen2.5-7B-Instruct SPLADE++ 0.5196 0.9815 0.3531 0.4926 0.3117 0.6509 0.6803 0.9567 0.6673 0.1124 0.4027 0.4812 0.2904 0.8006 0.6091 0.8665 0.6096 0.9045
methodQ2D (ZS) llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2doc \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train","mode":"zs"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e gpt-4.1 BGE-base-en-v1.5 0.6192 0.9900 0.3249 0.4268 0.3920 0.7411 0.7417 0.9633 0.7741 0.1404 0.4448 0.4848 0.3779 0.8306 0.6970 0.8701 0.6422 0.8184
methodquery2e llmgpt-4.1 retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e gpt-4.1 BM25 0.4062 0.9381 0.3778 0.4772 0.2690 0.5930 0.7089 0.9403 0.7150 0.1772 0.4633 0.5807 0.3446 0.7639 0.5935 0.8698 0.5759 0.8594
methodquery2e llmgpt-4.1 retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e gpt-4.1 SPLADE++ 0.3818 0.9808 0.3936 0.5477 0.3282 0.6670 0.7187 0.9393 0.6869 0.1222 0.4206 0.4992 0.3518 0.8380 0.6812 0.9302 0.6522 0.9252
methodquery2e llmgpt-4.1 retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1 \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e gpt-4.1-nano BGE-base-en-v1.5 0.6198 0.9900 0.3558 0.4657 0.3816 0.7261 0.7477 0.9633 0.7803 0.1407 0.4504 0.5018 0.3609 0.8321 0.6802 0.8662 0.6706 0.8514
methodquery2e llmgpt-4.1-nano retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e gpt-4.1-nano BM25 0.4060 0.9417 0.3597 0.4696 0.2524 0.5779 0.7016 0.9480 0.7373 0.1765 0.4557 0.5827 0.3101 0.7665 0.5891 0.8474 0.5475 0.8392
methodquery2e llmgpt-4.1-nano retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e gpt-4.1-nano SPLADE++ 0.3819 0.9808 0.3716 0.5295 0.3113 0.6493 0.7206 0.9387 0.6747 0.1214 0.4086 0.4906 0.3297 0.8143 0.6320 0.9104 0.6605 0.9142
methodquery2e llmgpt-4.1-nano retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model openai/gpt-4.1-nano \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e Qwen2.5-72B-Instruct BGE-base-en-v1.5 0.6196 0.9900 0.3610 0.4706 0.3793 0.7222 0.7382 0.9567 0.7857 0.1412 0.4509 0.5067 0.3744 0.8503 0.7069 0.8760 0.6606 0.8528
methodquery2e llmQwen2.5-72B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e Qwen2.5-72B-Instruct BM25 0.4066 0.3578 0.4641 0.2518 0.6969 0.6942 0.1611 0.4484 0.5647 0.3148 0.7605 0.5845 0.8501 0.5546 0.8609
methodquery2e llmQwen2.5-72B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e Qwen2.5-72B-Instruct SPLADE++ 0.5188 0.9808 0.3755 0.5195 0.3036 0.6438 0.7049 0.9427 0.6196 0.1201 0.4076 0.4799 0.3442 0.8328 0.6686 0.9104 0.6353 0.9286
methodquery2e llmQwen2.5-72B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-72B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e Qwen2.5-7B-Instruct BGE-base-en-v1.5 0.6205 0.9900 0.3415 0.4534 0.3795 0.7132 0.7378 0.9633 0.7618 0.1379 0.4454 0.4967 0.3521 0.8171 0.6646 0.8422 0.6425 0.8443
methodquery2e llmQwen2.5-7B-Instruct retrieverBGE-base-en-v1.5
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BGE-base-en-v1.5 (dense)
python -m pyserini.search.faiss \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.bge-base-en-v1.5 \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder BAAI/bge-base-en-v1.5 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e Qwen2.5-7B-Instruct BM25 0.4052 0.9403 0.3477 0.4691 0.2453 0.5494 0.6967 0.9520 0.6945 0.1653 0.4503 0.5824 0.3101 0.7432 0.5721 0.8431 0.5404 0.8548
methodquery2e llmQwen2.5-7B-Instruct retrieverBM25
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.flat \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"mode":"zs","num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · BM25 (lexical)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --bm25 --k1 0.9 --b 0.4 \
  --output run.txt \
  --hits 1000
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt
query2e Qwen2.5-7B-Instruct SPLADE++ 0.5193 0.9815 0.3386 0.5134 0.2912 0.6256 0.7115 0.9493 0.6080 0.1093 0.4073 0.4888 0.3056 0.7882 0.5474 0.8734 0.5312 0.9001
methodquery2e llmQwen2.5-7B-Instruct retrieverSPLADE++
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-arguana \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-arguana.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-arguana-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-dbpedia-entity \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-dbpedia-entity-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-fiqa \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-fiqa.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-fiqa-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-scifact \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-scifact.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-scifact-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-covid \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-covid.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-covid-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset beir-v1.0.0-trec-news \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index beir-v1.0.0-trec-news.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@100
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \
  beir-v1.0.0-trec-news-test run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.dlhard \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl19-passage run.txt
1 reformulate querygym → reformulated_queries.tsv
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2020 \
    --method query2e \
    --model Qwen/Qwen2.5-7B-Instruct \
    --steps reformulate \
    --temperature 1 \
    --max-tokens 128 \
    --method-params '{"num_examples":4,"train_split":"train"}' \
    --output-dir outputs/reproduce
2 retrieve pyserini · SPLADE++ (learned_sparse)
python -m pyserini.search.lucene \
  --threads 16 --batch-size 128 \
  --index msmarco-v1-passage.splade-pp-ed \
  --topics outputs/reproduce/queries/reformulated_queries.tsv \
  --encoder naver/splade-cocondenser-ensembledistil \
  --output run.txt \
  --hits 1000 --impact
3 evaluate trec_eval · nDCG@10 + R@1k
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
  dl20-passage run.txt