The Principal Dev – Masterclass for Tech Leads

The Principal Dev – Masterclass for Tech Leads28-29 May

Join

New Python HTML Libraries 2026

GitHub Libraries Python HTML Libraries

html5lib/html5lib-python 1K

added 1 year ago

Standards-compliant library for parsing and serializing HTML documents and fragments in Python

alir3z4/html2text 2K +2

added 1 year ago

Convert HTML to Markdown-formatted text.

gawel/pyquery 2K +2

added 1 year ago

A jQuery-like library for python.

mozilla/bleach 2K +2

added 1 year ago

Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

buriy/python-readability 2K -1

added 1 year ago

Given an HTML document, extract and clean up the main body text and title.

lxml/lxml 3K -3

added 1 year ago

lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language

scrapy/parsel 1K +3

added 1 year ago

Parsel lets you extract data from XML/HTML/JSON documents using XPath or CSS selectors.

psf/requests-html 13K -7

added 1 year ago

This library intends to make parsing HTML as simple and intuitive as possible.

libs.tech

Discover the best Python libraries and hidden gems. Coded at night under caffeine, ad-free, curated by Python community.
about | issues | follow

Thanks to our contributors

3831
93
9
4
4
3
3
3
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

Join libs.tech

...and unlock some superpowers

GitHub

We won't share your data with anyone else.