12 model × retriever configurations for this method across BEIR, MS MARCO DL, and DL-HARD.
Click any row or the + button to expand. Tabs switch dataset
context. The three steps (reformulate → retrieve → evaluate) update accordingly.
Retriever
Model
Datasets
BEIR ·
MS MARCO DL ·
Metric
| Model | Retriever | ArguAna | DBPedia | FiQA | SciFact | COVID | News | BRIGHT — AOPS | BRIGHT — Biology | BRIGHT — Earth Science | BRIGHT — Economics | BRIGHT — LeetCode | BRIGHT — Pony | BRIGHT — Psychology | BRIGHT — Robotics | BRIGHT — Stack Overflow | BRIGHT — Sustainable Living | BRIGHT — TheoremQA Questions | BRIGHT — TheoremQA Theorems | DL-HARD | DL 2019 | DL 2020 | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@1k | nDCG@10 | R@1k | nDCG@10 | R@1k | |||||||||||||||||||||||||||
| Qwen2.5-72B-Instruct | BGE-base-en-v1.5 | 0.6188 | 0.9900 | 0.3528 | 0.4617 | 0.3941 | 0.7358 | 0.7387 | 0.9600 | 0.7710 | 0.1367 | 0.4070 | 0.4508 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3498 | 0.8236 | 0.7121 | 0.8712 | 0.6411 | 0.8485 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-72B-Instruct | BM25 | 0.4060 | — | 0.3787 | 0.4778 | 0.2453 | — | 0.7077 | — | 0.6785 | 0.1590 | 0.4172 | 0.5578 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3075 | 0.7526 | 0.6378 | 0.8508 | 0.5651 | 0.8549 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-72B-Instruct | SPLADE++ | 0.5199 | 0.9808 | 0.3897 | 0.5470 | 0.3157 | 0.6411 | 0.6834 | 0.9533 | 0.6425 | 0.1159 | 0.4054 | 0.4627 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3016 | 0.8393 | 0.6941 | 0.9148 | 0.6099 | 0.8857 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-7B-Instruct | BGE-base-en-v1.5 | 0.6195 | 0.9893 | 0.3498 | 0.4463 | 0.3896 | 0.7244 | 0.7336 | 0.9667 | 0.7769 | 0.1386 | 0.4295 | 0.4584 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3391 | 0.8300 | 0.6561 | 0.8397 | 0.6302 | 0.8573 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-7B-Instruct | BM25 | 0.4011 | 0.9360 | 0.3669 | 0.4809 | 0.2405 | 0.5544 | 0.7096 | 0.9427 | 0.6997 | 0.1620 | 0.4349 | 0.5616 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3044 | 0.7815 | 0.6074 | 0.8585 | 0.5802 | 0.8684 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-7B-Instruct | SPLADE++ | 0.5200 | 0.9808 | 0.3697 | 0.5223 | 0.3206 | 0.6505 | 0.6825 | 0.9467 | 0.6567 | 0.1147 | 0.3831 | 0.4524 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.2731 | 0.8239 | 0.6513 | 0.9037 | 0.5948 | 0.9019 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1 | BGE-base-en-v1.5 | 0.6186 | 0.9886 | 0.3678 | 0.4556 | 0.4009 | 0.7483 | 0.7580 | 0.9633 | 0.7984 | 0.1380 | 0.4331 | 0.4763 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3755 | 0.8505 | 0.7125 | 0.8877 | 0.6720 | 0.8756 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1 | BM25 | 0.4028 | 0.9374 | 0.3934 | 0.4775 | 0.2578 | 0.5843 | 0.7135 | 0.9510 | 0.7277 | 0.1696 | 0.4656 | 0.5829 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3291 | 0.7737 | 0.6528 | 0.8777 | 0.6239 | 0.8781 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1 | SPLADE++ | 0.3820 | 0.9801 | 0.3926 | 0.5319 | 0.3154 | 0.6513 | 0.7120 | 0.9460 | 0.6858 | 0.1056 | 0.4160 | 0.4741 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3308 | 0.8456 | 0.6877 | 0.9153 | 0.6534 | 0.9089 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1-nano | BGE-base-en-v1.5 | 0.6194 | 0.9893 | 0.3843 | 0.4891 | 0.3967 | 0.7409 | 0.7499 | 0.9633 | 0.7995 | 0.1420 | 0.4312 | 0.4754 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3722 | 0.8367 | 0.6710 | 0.8530 | 0.6744 | 0.8709 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1-nano | BM25 | 0.4011 | 0.9360 | 0.3921 | 0.5132 | 0.2557 | 0.5758 | 0.7273 | 0.9560 | 0.7503 | 0.1744 | 0.4601 | 0.5728 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3320 | 0.7655 | 0.6254 | 0.8621 | 0.6092 | 0.8846 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1-nano | SPLADE++ | 0.3820 | 0.9801 | 0.3962 | 0.5324 | 0.3131 | 0.6532 | 0.7065 | 0.9433 | 0.6809 | 0.1163 | 0.4053 | 0.4554 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3426 | 0.8390 | 0.6544 | 0.8954 | 0.6271 | 0.9167 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"cot","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||