https://store-images.s-microsoft.com/image/apps.10812.8366e99b-7093-4fd1-85ab-db1a2e471c37.3da35ae5-f751-43a6-a729-d08c92ecc19b.b3e23202-bfe7-4266-b51d-881de1dfd587

Reader-LM 1.5b

Jina AI

Reader-LM 1.5b

Jina AI

Small Language Models for Cleaning and Converting HTML to Markdown

Jina Reader-LM 1.5 b is a small language model that converts HTML content to Markdown content, which is useful for content conversion tasks. The model is trained on a curated collection of HTML content and its corresponding Markdown content.

Highlights:
  • Jina Reader-LM 1.5b is designed to efficiently convert noisy HTML into clean markdown, showcasing a novel approach to web content extraction that is both cost-effective and scalable.
  • Jina Reader-LM 1.5b has been optimized for long context support, handling up to 256K tokens, which is crucial for dealing with the intricacies of modern HTML, including inline CSS and scripts.
  • Jina Reader-LM 1.5b outperforms larger language models in the HTML-to-markdown conversion task, despite being significantly smaller in size, which is a testament to their specialized training and design for this specific task.