How and why I built a Machine Learning model to predict table tennis matches results
About me
I'm a Data Professional who loves building data products to solve problems. I'm currently working together with professionals from various backgrounds to provide new analytical insights in industry. I'd love to combine my passion for open data to continue contributing to change people lives in a better and analytical world.
The problem I wanted to solve
A customer reached me out to help him building a profitable machine learning model to predict tennis table matches results based on the historical data. After starting the project I have noticed that the challenge was bigger than expected because the data provided, which was collected before using web scraping, was not reliable enough to train a good model.
What is Machine Learning model to predict sports results?
Given that I suggested to split the project into 3 main sprints:
- Collect the data again, but this time using a reliable API.
- Do again all the data transformation and cleaning.
- Deploy a reliable prescriptive model and automize it.
Tech stack
First of all I have chosen Python as the language for the project since python provides many libraries and documentations to support with any challengs during this milestone.
I have developed the project using Google Colab due the facility to share and explain each step to my customers given transparency for him.
The main libraries used were:
- requests
- pandas
- numpy
- sklearn
For data visualization:
- Tableau
The process of building Machine Learning model to predict sports results
For the data collection process I have used the requests library, called the data needed based on the Id of the leagues that my customer wanted to work with. These API calls returned, as expected, json format data that I easily converted to a tabular format using Pandas.
In the end of the data collection I guarantee over 200 thousand of good quality data to develop our predictive model for tennis table matches results.
The problem was solved based on a binary classification machine learning model since each game should has only two possibilities for each player: Win ou Loss.
Challenges I faced
The main challenges faced in this problem was the fact of not have a good quality data in the begining and it taken some days from me before realize that. Another good challenge that I have faced was that I could not evaluate the model normally as a classification problem using main metrics such as accuraccy, ROI curve, precision and recall.
The main metric which indicated the success of the model was RoI (Return of Investiment) in a long term based.
Key learnings
However it helped me to understand some project can not be solved using known and most used metrics in many machine learning problems.
Tips and advice
This project also helped me to know all the steps of a data science project since data collection to model deployment and assistance to our customers. This is so important for any data professional who wants to have a good perspective about a data science project. You should consider solving a relevant problem just not taking care about building model, but also working on the data collection, fully imersion into the problem and customer needs to treat and transform the data in order to have good results to present.
Final thoughts and next steps
Now I am exploring the Image Classification world coursing the Deep Learning Specialization by Andrew Ng from Coursera which a highly recommend it and expect to use these new skills to provide solutions in deep learning problems to my customers.