Run detail
4549e09aba8d1d38
Dataset
msmarco-v1-passage.dlhard
Method
Q2D (FS)
Model
Qwen2.5-7B-Instruct
Retriever
BM25 (lexical)
params_hash
534732e6
Queries
50
Metrics
| ndcg_cut_10 | 0.3141 |
| recall_1000 | 0.7724 |
Reproduce this run
Three steps: (1) reformulate the queries with QueryGym's example pipeline, (2) run retrieval with Pyserini, (3) evaluate with trec_eval.
1. reformulate
python examples/querygym_pyserini/pipeline.py \
--dataset msmarco-v1-passage.dlhard \
--method query2doc \
--model Qwen/Qwen2.5-7B-Instruct \
--steps reformulate \
--temperature 1 \
--max-tokens 128 \
--method-params '{"mode":"fs","num_examples":4,"train_split":"train"}' \
--output-dir outputs/reproduce 2. retrieve (BM25)
python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v1-passage \
--topics outputs/reproduce/queries/reformulated_queries.tsv \
--bm25 --k1 0.9 --b 0.4 \
--output run.txt \
--hits 1000 3. evaluate
python -m pyserini.eval.trec_eval -c -m ndcg.cut.10 -m recall.1000 \
/mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv run.txt Artifacts
Config
config.json
{
"method_params": {
"mode": "fs",
"num_examples": 4,
"dataset_type": "msmarco",
"collection_path": "/mnt/data/son/data/msmarco/collection.tsv",
"train_queries_path": "/mnt/data/son/data/msmarco/queries.train.tsv",
"train_qrels_path": "/mnt/data/son/data/msmarco/qrels.train.tsv",
"train_split": "train"
},
"llm_config": {
"temperature": 1,
"max_tokens": 128
},
"dataset_config": {
"topics": "/mnt/data/son/Thesis/t5/data/dlhard/neutral_queries.tsv",
"index": "msmarco-v1-passage",
"num_queries": 50
},
"retrieval": {
"retriever_id": "bm25",
"paradigm": "lexical",
"params": {
"k1": 0.9,
"b": 0.4
},
"implementation": "pyserini:LuceneSearcher"
}
}