12 model × retriever configurations for this method across BEIR, MS MARCO DL, and DL-HARD.
Click any row or the + button to expand. Tabs switch dataset
context. The three steps (reformulate → retrieve → evaluate) update accordingly.
Retriever
Model
Datasets
BEIR ·
MS MARCO DL ·
Metric
| Model | Retriever | ArguAna | DBPedia | FiQA | SciFact | COVID | News | BRIGHT — AOPS | BRIGHT — Biology | BRIGHT — Earth Science | BRIGHT — Economics | BRIGHT — LeetCode | BRIGHT — Pony | BRIGHT — Psychology | BRIGHT — Robotics | BRIGHT — Stack Overflow | BRIGHT — Sustainable Living | BRIGHT — TheoremQA Questions | BRIGHT — TheoremQA Theorems | DL-HARD | DL 2019 | DL 2020 | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@1k | nDCG@10 | R@1k | nDCG@10 | R@1k | |||||||||||||||||||||||||||
| Qwen2.5-72B-Instruct | BGE-base-en-v1.5 | 0.6213 | 0.9900 | 0.4013 | 0.4955 | 0.3891 | 0.7274 | 0.7431 | 0.9667 | 0.7775 | 0.1370 | 0.4842 | 0.4983 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3485 | 0.8498 | 0.6999 | 0.8733 | 0.6916 | 0.8785 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-72B-Instruct | BM25 | 0.3995 | — | 0.3744 | 0.4709 | 0.2484 | — | 0.7015 | — | 0.6809 | 0.1600 | 0.4474 | 0.5517 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3215 | 0.7876 | 0.6109 | 0.8396 | 0.6152 | 0.8727 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-72B-Instruct | SPLADE++ | 0.5174 | 0.9794 | 0.3830 | 0.5213 | 0.3333 | 0.6464 | 0.6796 | 0.9393 | 0.6324 | 0.1079 | 0.4168 | 0.4803 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3347 | 0.8285 | 0.6757 | 0.9005 | 0.6983 | 0.9284 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-7B-Instruct | BGE-base-en-v1.5 | 0.6208 | 0.9900 | 0.3731 | 0.4872 | 0.3837 | 0.7309 | 0.7434 | 0.9583 | 0.7668 | 0.1378 | 0.4406 | 0.4862 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3418 | 0.8267 | 0.6740 | 0.8469 | 0.6541 | 0.8606 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-7B-Instruct | BM25 | 0.3940 | 0.9324 | 0.3338 | 0.4669 | 0.2234 | 0.5488 | 0.6857 | 0.9347 | 0.6729 | 0.1569 | 0.4340 | 0.5419 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.2892 | 0.7746 | 0.5553 | 0.7976 | 0.5654 | 0.8454 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-7B-Instruct | SPLADE++ | 0.5170 | 0.9829 | 0.3613 | 0.5111 | 0.2978 | 0.6387 | 0.6616 | 0.9547 | 0.6431 | 0.1103 | 0.3910 | 0.4548 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3143 | 0.8305 | 0.6574 | 0.8890 | 0.6156 | 0.8945 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1 | BGE-base-en-v1.5 | 0.6231 | 0.9900 | 0.4005 | 0.5087 | 0.4162 | 0.7452 | 0.7367 | 0.9600 | 0.7954 | 0.1419 | 0.4697 | 0.4852 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3739 | 0.8543 | 0.7370 | 0.8936 | 0.7074 | 0.8754 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1 | BM25 | 0.3970 | 0.9324 | 0.3699 | 0.4890 | 0.2643 | 0.5814 | 0.7063 | 0.9403 | 0.7065 | 0.1620 | 0.4502 | 0.5608 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3018 | 0.7570 | 0.6832 | 0.8495 | 0.6418 | 0.8787 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1 | SPLADE++ | 0.3823 | 0.9801 | 0.3873 | 0.5289 | 0.3399 | 0.6821 | 0.6964 | 0.9493 | 0.6941 | 0.1152 | 0.4266 | 0.4566 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3552 | 0.8034 | 0.7335 | 0.9170 | 0.6739 | 0.9260 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1-nano | BGE-base-en-v1.5 | 0.6213 | 0.9893 | 0.3718 | 0.4717 | 0.3940 | 0.7272 | 0.7486 | 0.9593 | 0.7489 | 0.1355 | 0.4271 | 0.4749 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3688 | 0.8113 | 0.6523 | 0.8486 | 0.6612 | 0.8397 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1-nano | BM25 | 0.4021 | 0.9367 | 0.3680 | 0.4808 | 0.2509 | 0.5744 | 0.7059 | 0.9430 | 0.6885 | 0.1583 | 0.4326 | 0.5487 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3469 | 0.7480 | 0.5819 | 0.8385 | 0.6026 | 0.8649 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1-nano | SPLADE++ | 0.3811 | 0.9787 | 0.4019 | 0.5396 | 0.3360 | 0.6669 | 0.6939 | 0.9420 | 0.7079 | 0.1215 | 0.4227 | 0.4696 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3702 | 0.8506 | 0.6883 | 0.9010 | 0.6628 | 0.9279 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method qa_expand \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||