https://store-images.s-microsoft.com/image/apps.10812.458f958c-de30-4593-a8ef-e0150d732591.adab6b63-3d32-4804-ba31-052367f46b80.cf8baf9c-57c7-48f0-a793-a480209950d5

Reader-LM 0.5b

Jina AI

Reader-LM 0.5b

Jina AI

Small Language Models for Cleaning and Converting HTML to Markdown

Jina Reader-LM 0.5 b is a small language model that converts HTML content to Markdown content, which is useful for content conversion tasks. The model is trained on a curated collection of HTML content and its corresponding Markdown content.

Highlights:
  • Jina Reader-LM 0.5b is designed to efficiently convert noisy HTML into clean markdown, showcasing a novel approach to web content extraction that is both cost-effective and scalable.
  • Jina Reader-LM 0.5b has been optimized for long context support, handling up to 256K tokens, which is crucial for dealing with the intricacies of modern HTML, including inline CSS and scripts.
  • Jina Reader-LM 0.5b outperforms larger language models in the HTML-to-markdown conversion task, despite being significantly smaller in size, which is a testament to their specialized training and design for this specific task.