MonoQwen2-VL-v0.1

Metadata

The MonoQwen2-VL-v0.1 is a multimodal reranker finetuned with LoRA from Qwen2-VL-2B, optimized for asserting pointwise image-query relevance using the MonoT5 objective. That is, given a couple of image and query fed into the prompt of the VLM, the model is tasked to generate “True” if the image is relevant to the query and “False” otherwise. During inference, a relevancy score can then be obtained by comparing the logits of the two tokens and this score can effectively be used to rerank the candidates generated by a first-stage retriever (such as DSE or ColPali) or filter them using a threshold. (View Highlight)
This example demonstrates how to use the model to assess the relevance of an image with respect to a query. It outputs the probability that the image is relevant (“True”) or not relevant (“False”). (View Highlight)