AutoScraper and Flask: Create an API From Any Website in Less Than 5 Minutes And with Fewer Than 20 Lines of Python
In this tutorial, we are going to create our own e-commerce search API with support for both eBay and Etsy without using any external APIs. With the power of AutoScraper and Flask, we are able to achieve this goal in fewer than 20 lines of Python code for each site. I recommend reading my last article about AutoScraper if you haven’t done so yet.
Requirements
Install the required libraries using pip:
pip install -U autoscraper flask
Let’s Do It
First, we are going to create a smart scraper to fetch data from eBay’s search results page. Let’s say we want to get the title
, price
, and product link
of each item. Using AutoScraper, it would be easily done by just providing some sample data:
from autoscraper import AutoScraper
url = 'https://www.ebay.com/sch/i.html?_nkw=iphone'
wanted_list = ['Apple iPhone X 64GB Factory Unlocked Smartphone', '$389.99', 'https://www.ebay.com/itm/Apple-iPhone-X-64GB-Factory-Unlocked-Smartphone/254187579586?epid=238944741&hash=item3b2ec2a8c2:g:ZPQAAOSwD6VdpL~9']
scraper = AutoScraper()
result = scraper.build(url=url, wanted_list=wanted_list)
Note that if you want to copy and run this code, you may need to update the wanted_list
.
Now let’s get the results grouped by scraping rules:
scraper.get_result_similar(url, grouped=True)
From the output, we’ll know which rule corresponds to which data, so we can use it accordingly. Let’s set some aliases based on the output, remove redundant rules, and save the model so we can use it later:
scraper.set_rule_aliases({'rule_0aok': 'title', 'rule_vn5z': 'price', 'rule_buz1': 'url'})
scraper.keep_rules(['rule_0aok', 'rule_vn5z', 'rule_buz1'])
scraper.save('ebay-search')
Note that the rule IDs will be different for you if you run the code.
OK, we’ve got eBay covered. Let’s add support for Etsy search results too. We’ll start by building its scraper. This time, we will use wanted_dict
instead of wanted_list
. It will automatically set aliases for us:
url = 'https://www.etsy.com/search?q=macbook'
wanted_dict = {
'title': [
'Apple MacBook Pro i9 32GB 500GB Radeon 560X 15.4 2018 Touch Bar 2.9GHz 6-Core',
'Laptop MacBook Premium Ergonomic Wood Stand Holder Computer Gift Nerd Tech Geek Mens, woodworking gift, Home office workspace accessories',
],
'price': ['1,500.00', '126.65'],
'url': ['851553172']
}
scraper = AutoScraper()
scraper.build(url=url, wanted_dict=wanted_dict)
# get results grouped per rule so we'll know which one to use
scraper.get_result_similar(url, grouped=True)
As the links are generated with a unique ID on Etsy each time, we added one sample product ID to the wanted_dict
so we can create the link from it. Also, we provided two samples for title and price, as the structure of items on Etsy search result pages is different and we want the scraper to learn them all.
After analyzing the output, let’s keep our desired rules, remove the rest, and save our model:
scraper.keep_rules(['rule_705x', 'rule_70m8', 'rule_d9wp', 'rule_kv6p'])
scraper.save('etsy-search')
Now that we have our scrapers ready, we can create our fully functioning API for both sites in fewer than 40 lines:
from autoscraper import AutoScraper
from flask import Flask, request
ebay_scraper = AutoScraper()
etsy_scraper = AutoScraper()
ebay_scraper.load('ebay-search')
etsy_scraper.load('etsy-search')
app = Flask(__name__)
def get_ebay_result(search_query):
url = 'https://www.ebay.com/sch/i.html?_nkw=%s' % search_query
result = ebay_scraper.get_result_similar(url, group_by_alias=True)
return _aggregate_result(result)
def get_etsy_result(search_query):
url = 'https://www.etsy.com/search?q=%s' % search_query
result = etsy_scraper.get_result_similar(url, group_by_alias=True)
result['url'] = [f'https://www.etsy.com/listing/{i}' for i in result['url']]
return _aggregate_result(result)
def _aggregate_result(result):
final_result = []
for i in range(len(list(result.values())[0])):
final_result.append({alias: result[alias][i] for alias in result})
return final_result
@app.route('/', methods=['GET'])
def search_api():
query = request.args.get('q')
return dict(result=get_ebay_result(query) + get_etsy_result(query))
if __name__ == '__main__':
app.run(port=8080, host='0.0.0.0')
Here, we are defining an API with the parameter q as our search query. We will get and join eBay and Etsy search results and return them as a response. Note that we are passing group_by_alias=True
to the scraper to get the results grouped by our defined aliases.
By running this code, the API server will be up listening on port 8080. So let’s test our API by opening http://localhost:8080/?q=headphone
in our browser:
Voila! We have our e-commerce API ready. Just replace headphone
in the URL with your desired query to get its search results.
Final Notes
The final code for this tutorial is available on GitHub.
This is a development setup suitable for developing and testing. Flask’s built-in server is not suitable for production. For production usage, please check Flask’s deployment options.
This tutorial is intended for personal and educational use. If you want to scrape websites, you can check their policies regarding the scraping bots.
I hope this article is useful and helps to bring your ideas into code faster than ever. Happy coding!
very interesting thanks for sharing
Wow, I didn’t know the Autoscraper package existed. Really cool stuff!