RAR-B: Reasoning as Retrieval Benchmark

In recent years, semantic textual similarity (STS) and information retrieval (IR) tasks have been crucial in measuring the progress of embedding models. The emerging Retrieval-augmented Generation (RAG) paradigm prompts a closer look at the language understanding and reasoning abilities of these models. Can retrievers effectively solve reasoning problems? Current state-of-the-art retriever models may struggle, but decoder-based embedding models offer hope in bridging the gap. Instruction-aware IR models face challenges in reasoning tasks, highlighting the need for improvement in the research community. Fine-tuning re-ranker models proves more effective in enhancing reasoning abilities compared to bi-encoders. The Reasoning as Retrieval Benchmark (RAR-b) provides a comprehensive evaluation of retriever models’ reasoning abilities.

https://arxiv.org/abs/2404.06347

To top