Codementor Events

HTML Parser — Developer Tools

Published Jul 29, 2019Last updated Oct 14, 2021
HTML Parser — Developer Tools

Hello Coder,

This article contains a shortlist with a few code blocks written in Python on top of BeautifulSoup library, used by me to process and manipulate HTML files.

Bootstrap BeautifulSoup

# import BS magic
from bs4 import BeautifulSoup as bs, NavigableString, Tag

# load the HTML file
html_file = open('index.html', 'r')
html_content = html_file.read()
html_file.close()

# initialize BS object
# using the html.parser
soup = bs(html_content, 'html.parser')
# At this point, we can interact with the HTML tree, using BS helpers

BeautifulSoup library supports more than one parser (e.g. lxml, xml, html5lib), the differences between them become clear on non well-formed HTML documents. For instance, lxml will add missing closing tags for all elements. For more information please access the dedicated section in the documentation regarding this topic.

Parse Head section

To select the whole HEAD node, and interact with all elements we need to write just a few lines of code:

header = soup.find('head')

# If we want to change the title
header.title.string.replace_with('Updated title') 

Parse HTML for JS Scripts

Javascript files are present in the HTML using script nodes:

...
<script type='text/javascript' src='js/bootstrap.js'></script>
<script type='text/javascript' src='js/custom.js'></script>
...

To scan the HTML soup for script tags, we can use the find_all helper:

for script in soup.body.find_all('script', recursive=False):

   # Print the path 
   print(' JS source = ' + script[src]) 

   # Update (normalize) the path
   js_path = script['src']
   js_file = js_path.split('/')[-1] # extract the file name
   script[src] = '/assets/js/' + js_file

Parse HTML for Images

for img in soup.body.find_all('img'):

   # Print the path 
   print(' IMG src = ' + img[src]) 

   img_path = img['src']
   img_file = img_path.split('/')[-1] # extract the file name 
   img[src] = '/assets/img/' + img_file

Save our work

All our changes are made in memory. To make these changes permanent we need to extract the string representation of our processed HTML from BS, and dump it into a file for later usage:

processed_html = soup.prettify(formatter="html")
f = open( 'index_bs.html', 'w+')
f.write(processed_html)
f.close

Where to go from here

With a minimal API, buit on top of BS library, we can easily automate small parts from the web development process to solve some repetitive tasks:

  • normalize HTML to easily align all assets
  • extract components (team cards, product features, carusels) in separate files for later usage
  • strip the hardcoded strings
  • translate HTML components to be used by other template engines: PUG, Jinja2

Open-Source Apps built using and HTML Parser

Flask Bootstrap 5

Volt Dashboard is a free and open source Bootstrap 5) dashboard template featuring over 100 components, 11 example pages and 3 plugins with Vanilla JS. There are more than 100 free Bootstrap 5 components included some of them being buttons, alerts, modals, datepickers and so on.

Flask Bootstrap 5 Volt - Template project provided by AppSeed.


Flask Soft UI Dashboard

Open-Source Django Dashboard coded with basic modules, database, ORM and deployment scripts on top of Soft UI Dashboard (free version), a modern Bootstrap 5 design. Designed for those who like bold elements and beautiful websites, Soft UI Dashboard is ready to help you create stunning websites and webapps.

Soft UI Dashboard - Flask Template project provided by AppSeed.


Datta Able Flask

Open-source dashboard generated by AppSeed in Flask Framework. Datta Able Bootstrap Lite is the most stylised Bootstrap 4 Lite Admin Template, around all other Lite/Free admin templates in the market. It comes with high feature-rich pages and components with fully developer-centric code. Before developing Datta Able our key points were performance and design.

Flask Datta Able - Starter project coded in Flask.


Btw, my nick name is Sm0ke, and I'm writing a lot on Dev.to
Sm0ke - Founder of AppSeed.us

Discover and read more posts from Adi Chirilov - Sm0ke
get started
post comments1Reply
Lasa Adams
a year ago

Developer tools are essential resources for programmers and software engineers, enabling them to streamline and enhance their workflow. These tools encompass a wide range of utilities, frameworks, libraries, and software that facilitate the development, testing, and debugging of applications. From integrated development environments (IDEs) to version control systems, these tools provide a robust foundation for creating efficient and reliable software solutions. Now you can check https://magicalkatrina.com for the best magic ways. One notable resource in this realm is MagicalKatrina.com, a comprehensive platform that offers a diverse array of developer tools, tutorials, and resources to support the coding community in their quest for excellence.