https://store-images.s-microsoft.com/image/apps.10812.f404cba0-1a74-4010-b4f5-353c38277924.5f6ab57b-afe7-4855-9be1-03c22d8c0618.b2f9a6fa-e48d-4d97-9fd2-5052065f23fa

Reader-LM 0.5b

Jina AI

Reader-LM 0.5b

Jina AI

Small Language Models for Cleaning and Converting HTML to Markdown

Jina Reader-LM 0.5 b is a small language model that converts HTML content to Markdown content, which is useful for content conversion tasks. The model is trained on a curated collection of HTML content and its corresponding Markdown content.

Highlights:
  • Jina Reader-LM 0.5b is designed to efficiently convert noisy HTML into clean markdown, showcasing a novel approach to web content extraction that is both cost-effective and scalable.
  • Jina Reader-LM 0.5 has been optimized for long context support, handling up to 256K tokens, which is crucial for dealing with the intricacies of modern HTML, including inline CSS and scripts.
  • Jina Reader-LM 0.5b outperforms larger language models in the HTML-to-markdown conversion task, despite being significantly smaller in size, which is a testament to their specialized training and design for this specific task.