How and why I built: Web scraping project for machine learning and data visualization
About me
I'm mechanical engineer fascinated by data, machine learning, data science, artificial intelligence, data visualization.
The problem I wanted to solve
This model was built in order to solve the challenge when identifying a real good (or even safe) franchise model for investing.
What is Web scraping project for machine learning and data visualization?
- I built a web scraping project for machine learning and data visualization, so basically I made the web scraping in franchise website.
- Then I applied machine learning to check which franchise model is worth investing in.
- So finally I did the data visualization in power bi to analyse everything in charts.
Tech stack
-
Python: A great tool for many things, like web scraping, building machine learning models, etc.
-
HTML/CSS: Hard to do some deep web scraping without having knowlodge in both.
-
Power BI: Currently the best software for data visualization in the market.
The process of building Web scraping project for machine learning and data visualization
That’s may seem like a silly question, though we have many uses for web scraping, so I will break down by item:
-
Company or personal interests.
-
ML
-
BI
-
Company: This first item is obviously too generic, but lets make an imagination exercise. What if every company had the power to scrape properly tons of data from internet, for example: An e-commerce company know exactly how amazon system works, this surely will be a great advantage against its competitors, of course, it depends on sector in which the company operates.
Personal interest: Imagine that you want to buy a cheap flight ticket to London, how do you do to know the best price, in which day or hour will it be available to sell?
-
With the target/company/website defined, you can build your model with the data and answer that you got, and open the world of ML.
-
Why use BI-Business Intelligence after all the work to scrape, build the ML model (or even put it to production) and refine the model? In most cases, of course, if this is not a personal project, you’ll have to show what you did to your boss or whoever you’re trying to convince that your model or idea is good.
So here we have some good options for data visulization, like power bi, looker, tableau, etc.
I recommend power bi, as it has the most prizes in “data viz” competitions.
Challenges I faced
sites often have bad html construction and structure, because they are not usually planned, the construction occurs with necessity (like cities). So many tags, classes, etc are managed poorly, so when web scraping these tags, classes or anything inside HTML you will face a lot of issues.
For example, a website with a certain product can contain four data, like:
1 ) Price
2) Stock
3) Color
4) Reviews
If the stock tag is not filled, sometimes you may have "out of stock" output, but sometimes you can have "None", because there is nothing like the product never existed (just for html purposes).
None for tag triggers errors in frameworks, and this is just the first step of complexity.
Anyway, in this case you can solve with Try/Except in python.
Key learnings
Web scraping it is very useful for companies growth, anyone who uses wiselly can surely have great gains against its competitors, but it can be used as well for personal reasons like simply buying the chipest plane ticket.
Tips and advice
If you are starting in web scraping, machine learning or data visualization I recommend that first you try learning python or power bi, then you can go for web scraping/ML models.
Because when web scraping you require at least a medium knowlodge in python.
Final thoughts and next steps
The first part of this project is web scraping.
The second one is machine learning, and finally the third is visualization through power bi.
So I'll keep going on in this project.