Scraping an e-commerce website with BeautifulSoup
Case study
This guide walks you through how to scrape an e-commerce website with BeautifulSoup python library.
What you’ll need
For the sake of this tutorial you'll need a complete sample ecommerce website. I bundled a e-commerce website sample with the complete source code of the tutorial. Clone the repository, and open the folder shop-cart, and inside this one, run this command. It will serve the content of the folder.
python -m http.server 8000
Open your webbrowser at this location: http://localhost:8000/products.html
How to complete this tutorial
1. Install requests and beautifulsoup library:
pip install requests
pip install beautifulsoup4
2. Your first parsing with beautifulSoup
from bs4 import BeautifulSoup
import requests
page = requests.get("http://localhost:8000/products.html")
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())
Here we make an http request to retrieve an url that host our e-commerce web page. Then we parse the content of the page we get, with 'html.parser', that is included in Python’s standard library. And finally we display the html code!.
3. Retrieve all products
If you take a look at the html code behind the product list webpage, you might notice that all product are wrapped inside a div tag with a class well well-small like this:
<div class="well well-small">
<h3>Our Products </h3>
<!-- Products goes here -->
</div>
And each product is built like this:
<li class="span4">
<div class="thumbnail">
<a href="product_details.html" class="overlay"></a>
<a class="zoomTool" href="product_details.html" title="add to cart"><span class="icon-search"></span> QUICK VIEW</a>
<a href="product_details.html"><img src="assets/img/a.jpg" alt=""></a>
<div class="caption cntr">
<p>Manicure & Pedicure</p>
<p><strong> $22.00</strong></p>
<h4><a class="shopBtn" href="#" title="add to cart"> Add to cart </a></h4>
<div class="actionList">
<a class="pull-left" href="#">Add to Wish List </a>
<a class="pull-left" href="#"> Add to Compare </a>
</div>
<br class="clr">
</div>
</div>
</li>
So, the key here is you cannot srape a website if you don't know how it is built. You have to figure out how things goes. A tip here is to right click on any page and select view page source option. There you go.
Now, we can use the find_all method to search for items by class or by id. In our case, we are looking for all li elements with span4 class.
from bs4 import BeautifulSoup
import requests
page = requests.get("http://localhost:8000/products.html")
soup = BeautifulSoup(page.content, 'html.parser')
def retrieve_all_products():
print(soup.find_all('li', class_='span4'))
if __name__ == '__main__':
retrieve_all_products()
If you run it, it must return a list as response.
4. Get product price
Now, let's get one product's price
from bs4 import BeautifulSoup
import requests
page = requests.get("http://localhost:8000/products.html")
soup = BeautifulSoup(page.content, 'html.parser')
def retrive_first_product_price():
all_products = soup.find_all('li', class_='span4')
product_one = all_products[0]
product_one_price = product_one.find("strong")
print(product_one_price.get_text())
print(product_one_price.get_text().strip().strip('$'))
if __name__ == '__main__':
retrive_first_product_price()
First, we get all products. Then we take the result and upon this, we look for the price. This one is inside a strong tag. After fiding the price we display it. We can also removed $ character. As you see, you can search element based on previous result's search. Unlike find_all method that returns a list of elements or an empty list, find method returns a single element or None.
5. Build a fake price comparator
Let's suppose we want to compare our products with their price as criteria. Here is a very simple way to do it.
from bs4 import BeautifulSoup
import requests
page = requests.get("http://localhost:8000/products.html")
soup = BeautifulSoup(page.content, 'html.parser')
def lazy_comparator():
all_products = soup.find_all('li', class_='span4')
products = {}
for product in all_products:
products[product.find("p").get_text().strip()] = product.find("strong").get_text().strip().strip('$')
print (sorted([(v, k) for k, v in products.items()]))
if __name__ == '__main__':
lazy_comparator()
Some notes here. After getting all product, we put each one into a dictionnary, and the we make a filtering.
That's it
Get the complete source code on github. Take also a look at the official BeautifulSoup documentation.
That’s a great guide on scraping with BeautifulSoup! Thanks for sharing, Kayode Adechinan T. Salami. This is a valuable resource for anyone looking to extract data from e-commerce websites.
While scraping can be a powerful tool, it’s important to always respect the robots.txt guidelines of a website before scraping data. It’s also good practice to avoid overwhelming a website with too many requests.
For those who are interested in creating their own games, there are fantastic tools available to streamline development. RPG Maker MV is a user-friendly engine that allows you to build immersive role-playing games. Check out the selection of rpg maker mv best plugins [https://nextlevelgamingstore.net/rpg-maker-mv-plugins/] available at Next Level Gaming Store. These plugins can add a ton of functionality and customization options to your project, letting you create a truly unique RPG experience.
Whether you’re scraping data or crafting your own game, there are resources available to help you achieve your goals. Here’s to building and exploring!
you can get more information here <a href=“mbwhtappios.com”>Gb WhatsApp </a>
Hey can you help me with: https://nsapp.download/