This talk was part of Developer Growth Summit 2022. Go to the DGS2022 page to view recordings of all sessions.
About the talk
OLX is a global trading platform located in 30+ countries, data science is key to keeping customers happy and safe. In this talk, we’ll explore use cases such as trust, safety, search, recommendations, and seller experience.
This talk will cover
- How machine learning projects are decided at OLX
- Trust and safety, with a deeper dive into some ML models
- Search, recommendation and seller experience features
About the speaker
Alexey is the Principal Data Scientist @ OLX and author of Machine Learning Bookcamp. He also hosts DataTalks.Club, a community for data enthusiasts.
Highlights of the talk
What are the data science examples at OLX’s online marketplace
OLX is an online marketplace, and there are multiple areas data science can help with. First, you can help the buyers gain positive experience by finding what they want to find as fast as possible. Another area is to help sellers sell as fast as possible. And lastly, you work on the marketplace to make sure there are no fraudsters and that you are able to catch them when they show up on the marketplace. We essentially want to help buyers and sellers have the best experience so the marketplace can earn money.
To help buyers, the data science team has implemented smart ranking, reduced null-searches, worked on query categorization, and spelling check. Search is not the only thing, sometimes people want suggestions too. The marketplace will have a recommender system to make buyers happy.
To make sellers happy, you want to make sure they don’t have problems with creating listings. Helping sellers with image quality, listing quality, deal prediction, and price prediction. You also want to detect already sold listings to pull the listings down.
Another area is the interaction between buyers and sellers, and you want to make the transactions safe. Usually here, you’ll want to detect the nasty things. You’ll implement NSFW detection, forbidden items, fraud detection, duplicate detection, and chat moderation. The last thing is to make sure the platform makes money.
What is a moderation system?
Let’s say a user wants to sell a stuffed dog toy, and to increase their chances of selling, they create multiple listings of the same object with different photos. This creates a problem and redundant content, which makes it hard for buyers to find what they want to buy. Another problem is when people upload pornographic content to gain attention and spread links in the listing page. This also negatively impacts users’ experience.
The problems include illegal items, NSFW content, duplicates, spam and fraud. And to deal with them, you need to build a content moderation system. You can build an automatic moderation system that filters through all the posts and listings, and have the ML modules either accept or reject the postings. If the automatic moderation system can’t tell whether an item complies with the marketplace’s rules, the posts will be passed onto a moderation panel with human moderators. The ML modules don't replace humans, but instead, help moderators be more efficient at their jobs.
The goal is to pre-moderate the listings as much as possible, and only leaving ticket cases to human moderators.
In the automatic moderation system, there are systems for duplicate detection, forbidden items, and other ML models. Once the moderation panel and the moderators have marked something as duplicate, it is fed back into the system as a feedback to help improve the automatic moderation system.
What is a recommender system?
Think of YouTube, it shows you things that you’re likely to be interested in. The recommender system will get all the interactions between item and users and put them into matrix “A,” and use a thing called collaborative filtering. It extracts every vector (an array of certain lengths). You can find vectors that are close to the user. If you’d like to dig deeper into recommender systems, you can find a paper titled “Matrix factorization techniques for recommender systems,” if you already know a bit about machine learning, it will help you understand the process from going from items and users to vectors.
In addition to the homepage, where the listings are individualized for you, each listing also has a “people are also interested in” section that shows similar items to the one you’re viewing. You can also use a neural network to help you get an item, an array, and more.