Abstract:
With the explosive growth of digital image data, image retrieval technology faces the core challenge of balancing accuracy and efficiency. Traditional methods overly rely on local features but tend to neglect global semantic information, which limits the accuracy of retrieval results. To address this issue, this paper proposes an image retrieval system featuring multi-path recall and dynamic weighted fusion. By optimizing the global and local features collaboratively, it achieved efficient and precise retrieval, mainly applied in the financial company practical scenarios such as identifying merchant’s ID photos and store photos. Deep semantic features were extracted based on the DINOv2 model, combined with local detail features encoded by SIFT/VLAD, to construct a dual-path recall strategy that took into account both semantic consistency and fine-grained differences. An adaptive weighting function was designed to dynamically adjust the fusion weights of global and local features based on the degree of local feature overlap, solving the generalization problem of traditional fixed-weight strategies. Experiments show that under the condition of a false acceptance rate of one in 100 000 for negative samples, this algorithm significantly outperforms the DINOv2 model and reduces the reliance on a single model.