12 model × retriever configurations for this method across BEIR, MS MARCO DL, and DL-HARD.
Click any row or the + button to expand. Tabs switch dataset
context. The three steps (reformulate → retrieve → evaluate) update accordingly.
Retriever
Model
Datasets
BEIR ·
MS MARCO DL ·
Metric
| Model | Retriever | ArguAna | DBPedia | FiQA | SciFact | COVID | News | BRIGHT — AOPS | BRIGHT — Biology | BRIGHT — Earth Science | BRIGHT — Economics | BRIGHT — LeetCode | BRIGHT — Pony | BRIGHT — Psychology | BRIGHT — Robotics | BRIGHT — Stack Overflow | BRIGHT — Sustainable Living | BRIGHT — TheoremQA Questions | BRIGHT — TheoremQA Theorems | DL-HARD | DL 2019 | DL 2020 | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@100 | nDCG@10 | R@1k | nDCG@10 | R@1k | nDCG@10 | R@1k | |||||||||||||||||||||||||||
| Qwen2.5-72B-Instruct | BGE-base-en-v1.5 | 0.6190 | 0.9900 | 0.4113 | 0.5101 | 0.4098 | 0.7431 | 0.7540 | 0.9633 | 0.7891 | 0.1401 | 0.4857 | 0.5135 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3845 | 0.8568 | 0.7419 | 0.9027 | 0.6792 | 0.8913 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-72B-Instruct | BM25 | 0.3991 | — | 0.3904 | 0.4993 | 0.2509 | — | 0.7163 | — | 0.7078 | 0.1673 | 0.4807 | 0.6048 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3467 | 0.8020 | 0.6875 | 0.8959 | 0.6264 | 0.8907 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-72B-Instruct | SPLADE++ | 0.5200 | 0.9801 | 0.3662 | 0.5023 | 0.3261 | 0.6552 | 0.7035 | 0.9567 | 0.6689 | 0.1142 | 0.4238 | 0.4846 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3333 | 0.8206 | 0.7151 | 0.9124 | 0.6499 | 0.9234 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-72B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-7B-Instruct | BGE-base-en-v1.5 | 0.6207 | 0.9886 | 0.3922 | 0.4865 | 0.3866 | 0.7308 | 0.7454 | 0.9567 | 0.7922 | 0.1388 | 0.4627 | 0.5133 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3628 | 0.8348 | 0.6776 | 0.8535 | 0.6402 | 0.8578 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-7B-Instruct | BM25 | 0.3984 | 0.9353 | 0.3859 | 0.4831 | 0.2430 | 0.5533 | 0.7149 | 0.9443 | 0.7423 | 0.1668 | 0.4778 | 0.5842 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3141 | 0.7724 | 0.5884 | 0.8605 | 0.5428 | 0.8691 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| Qwen2.5-7B-Instruct | SPLADE++ | 0.5199 | 0.9808 | 0.3575 | 0.4927 | 0.3079 | 0.6483 | 0.7120 | 0.9500 | 0.6793 | 0.1095 | 0.4146 | 0.4917 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.2672 | 0.8116 | 0.6095 | 0.8612 | 0.5492 | 0.9062 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1 | BGE-base-en-v1.5 | 0.6179 | 0.9893 | 0.4302 | 0.5303 | 0.4205 | 0.7542 | 0.7519 | 0.9667 | 0.8039 | 0.1411 | 0.4715 | 0.5157 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.4074 | 0.8726 | 0.7272 | 0.8890 | 0.7141 | 0.8948 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1 | BM25 | 0.4012 | 0.9410 | 0.4010 | 0.5083 | 0.2684 | 0.5993 | 0.7123 | 0.9493 | 0.7081 | 0.1639 | 0.4801 | 0.5842 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3562 | 0.8042 | 0.6904 | 0.8861 | 0.6746 | 0.8984 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1 | SPLADE++ | 0.3826 | 0.9808 | 0.3910 | 0.5192 | 0.3446 | 0.6890 | 0.7093 | 0.9567 | 0.6591 | 0.1099 | 0.4302 | 0.5009 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3771 | 0.8396 | 0.6932 | 0.9068 | 0.6749 | 0.9389 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1 \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1-nano | BGE-base-en-v1.5 | 0.6188 | 0.9900 | 0.4026 | 0.5104 | 0.4039 | 0.7311 | 0.7417 | 0.9567 | 0.7793 | 0.1402 | 0.4539 | 0.4763 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3480 | 0.8374 | 0.7157 | 0.8601 | 0.6988 | 0.8742 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BGE-base-en-v1.5 (dense) python -m pyserini.search.faiss \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.bge-base-en-v1.5 \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder BAAI/bge-base-en-v1.5 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1-nano | BM25 | 0.3965 | 0.9324 | 0.3720 | 0.4873 | 0.2531 | 0.5833 | 0.7053 | 0.9410 | 0.6827 | 0.1634 | 0.4442 | 0.5398 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3358 | 0.7627 | 0.6643 | 0.8527 | 0.6227 | 0.8848 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.flat \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · BM25 (lexical) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --bm25 --k1 0.9 --b 0.4 \ --output run.txt \ --hits 1000 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||
| gpt-4.1-nano | SPLADE++ | 0.3823 | 0.9801 | 0.3790 | 0.5204 | 0.3390 | 0.6636 | 0.7121 | 0.9400 | 0.6715 | 0.1182 | 0.4292 | 0.4573 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | 0.3533 | 0.8005 | 0.6318 | 0.8839 | 0.6471 | 0.9232 | |
| 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-arguana \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-arguana.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-arguana-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-dbpedia-entity \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-dbpedia-entity.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-dbpedia-entity-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-fiqa \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-fiqa.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-fiqa-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-scifact \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-scifact.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-scifact-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-covid \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-covid.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-covid-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset beir-v1.0.0-trec-news \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index beir-v1.0.0-trec-news.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@100 python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.100 \ beir-v1.0.0-trec-news-test run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ /mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2019 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl19-passage run.txt 1 reformulate querygym → reformulated_queries.tsv python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.trecdl2020 \
--method query2doc \
--model openai/gpt-4.1-nano \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2 retrieve pyserini · SPLADE++ (learned_sparse) python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --index msmarco-v1-passage.splade-pp-ed \ --topics outputs/reproduce/queries/reformulated_queries.tsv \ --encoder naver/splade-cocondenser-ensembledistil \ --output run.txt \ --hits 1000 --impact 3 evaluate trec_eval · nDCG@10 + R@1k python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \ dl20-passage run.txt | ||||||||||||||||||||||||||||||||||||||||||||