https://store-images.s-microsoft.com/image/apps.10812.c39d69bc-6eba-4e46-8617-41674e9924fa.7ed88f51-9aa9-4511-88d0-ab79db8e4395.a992c1bc-1680-496d-a300-a0a47dc140d7

Jina CLIP v1
Jina AI

Categories

AI + Machine Learning Compute

Support

Legal

License Agreement Privacy Policy

Jina CLIP v1

Jina AI

Overview Plans + Pricing Ratings + reviews

Embedding model for cross-modal and multimodal retrieval for text and image data

With jina-clip-v1, users have a single embedding model that delivers state-of-the-art performance in both text-only and text-image cross-modal retrieval.
Jina AI has improved on OpenAI CLIP’s performance by 165% in text-only retrieval, and 12% in image-to-image retrieval, with identical or mildly better performance in text-to-image and image-to-text tasks.

Highlights:

Superior performance on all combinations of modalities, and especially large improvements in text-only embedding performance.
Support for much longer text inputs. Jina Embeddings’ 8k token input support makes it possible to process detailed textual information and correlate it with images.
A large net savings in space, compute, code maintenance, and complexity because this multimodal model is highly performant even in non-multimodal scenarios.