A retrieval model with contextual correlation analysis for verbose queries.
Saved in:
| Title: | A retrieval model with contextual correlation analysis for verbose queries. |
|---|---|
| Authors: | Podder, Dipannita1 (AUTHOR) dipannita@iitkgp.ac.in, Paik, Jiaul H.1 (AUTHOR) jiaul@ai.iitkgp.ac.in, Mitra, Pabitra1 (AUTHOR) pabitra@cse.iitkgp.ac.in |
| Source: | Journal of Intelligent Information Systems. Apr2026, Vol. 64 Issue 2, p469-496. 28p. |
| Subjects: | Information retrieval, Markov random fields, Machine learning, Language models, Dependence (Statistics) |
| Abstract: | Retrieving relevant documents using verbose queries is a key challenge in information retrieval, as such queries often include extraneous terms. Traditional retrieval models treat all query terms equally, which limits their effectiveness. Existing methods for verbose queries are typically supervised or rely on costly two-stage ranking pipelines. We propose a fully unsupervised, single-phase retrieval model that estimates the centrality of each query term by analyzing its contextual correlation with the entire query. A fully connected term graph is constructed, where edge weights capture the relative correlation of each term with the query context compared to others. Centrality scores are computed via power iteration over this graph. Dense representations of query terms and context are obtained using a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model. To further reduce the influence of non-informative document terms, an additional weight based on term information content is introduced. These two weights are combined and integrated into a modified Markov Random Field Sequential Dependence Model (SDM) for ranking. Experiments show that our model outperforms unsupervised baselines, performs comparably to supervised baselines, and surpasses several neural rankers in zero-shot settings. Comparable results with both GloVe and BERT embeddings highlight its embedding independence nature. The model shows larger gains on longer queries, modest improvements on shorter ones. Therefore, the model's independence from relevance judgments and top-ranked documents, along with its consistent, embedding-agnostic performance across query lengths, makes it well-suited for low-resource scenarios. [ABSTRACT FROM AUTHOR] |
| Copyright of Journal of Intelligent Information Systems is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
Be the first to leave a comment!