HTML Parser — Developer Tools
Hello Coder,
This article contains a shortlist with a few code blocks written in Python on top of BeautifulSoup library, used by me to process and manipulate HTML files.
BeautifulSoup
Bootstrap# import BS magic
from bs4 import BeautifulSoup as bs, NavigableString, Tag
# load the HTML file
html_file = open('index.html', 'r')
html_content = html_file.read()
html_file.close()
# initialize BS object
# using the html.parser
soup = bs(html_content, 'html.parser')
# At this point, we can interact with the HTML tree, using BS helpers
BeautifulSoup library supports more than one parser (e.g. lxml, xml, html5lib), the differences between them become clear on non well-formed HTML documents. For instance, lxml will add missing closing tags for all elements. For more information please access the dedicated section in the documentation regarding this topic.
Parse Head section
To select the whole HEAD node, and interact with all elements we need to write just a few lines of code:
header = soup.find('head')
# If we want to change the title
header.title.string.replace_with('Updated title')
Parse HTML for JS Scripts
Javascript files are present in the HTML using script nodes:
...
<script type='text/javascript' src='js/bootstrap.js'></script>
<script type='text/javascript' src='js/custom.js'></script>
...
To scan the HTML soup for script tags, we can use the find_all helper:
for script in soup.body.find_all('script', recursive=False):
# Print the path
print(' JS source = ' + script[src])
# Update (normalize) the path
js_path = script['src']
js_file = js_path.split('/')[-1] # extract the file name
script[src] = '/assets/js/' + js_file
Parse HTML for Images
for img in soup.body.find_all('img'):
# Print the path
print(' IMG src = ' + img[src])
img_path = img['src']
img_file = img_path.split('/')[-1] # extract the file name
img[src] = '/assets/img/' + img_file
Save our work
All our changes are made in memory. To make these changes permanent we need to extract the string representation of our processed HTML from BS, and dump it into a file for later usage:
processed_html = soup.prettify(formatter="html")
f = open( 'index_bs.html', 'w+')
f.write(processed_html)
f.close
Where to go from here
With a minimal API, buit on top of BS library, we can easily automate small parts from the web development process to solve some repetitive tasks:
- normalize HTML to easily align all assets
- extract components (team cards, product features, carusels) in separate files for later usage
- strip the hardcoded strings
- translate HTML components to be used by other template engines: PUG, Jinja2
Open-Source Apps built using and HTML Parser
Flask Bootstrap 5
Volt Dashboard is a free and open source Bootstrap 5) dashboard template featuring over 100 components, 11 example pages and 3 plugins with Vanilla JS. There are more than 100 free Bootstrap 5 components included some of them being buttons, alerts, modals, datepickers and so on.
Flask Soft UI Dashboard
Open-Source Django Dashboard coded with basic modules, database, ORM and deployment scripts on top of Soft UI Dashboard (free version), a modern Bootstrap 5 design. Designed for those who like bold elements and beautiful websites, Soft UI Dashboard is ready to help you create stunning websites and webapps.
Datta Able Flask
Open-source dashboard generated by AppSeed in Flask Framework. Datta Able Bootstrap Lite is the most stylised Bootstrap 4 Lite Admin Template, around all other Lite/Free admin templates in the market. It comes with high feature-rich pages and components with fully developer-centric code. Before developing Datta Able our key points were performance and design.
HTML Parser related resources
- BeautifulSoup Html Parser documentation
Developer tools are essential resources for programmers and software engineers, enabling them to streamline and enhance their workflow. These tools encompass a wide range of utilities, frameworks, libraries, and software that facilitate the development, testing, and debugging of applications. From integrated development environments (IDEs) to version control systems, these tools provide a robust foundation for creating efficient and reliable software solutions. Now you can check https://magicalkatrina.com for the best magic ways. One notable resource in this realm is MagicalKatrina.com, a comprehensive platform that offers a diverse array of developer tools, tutorials, and resources to support the coding community in their quest for excellence.