Using Machine Learning to Build a Ride Acceptance Model for Uber
With the latest update now Uber is showing trip destinations to drivers before they decide to accept a ride to enable them to make informed choices. You will likely not have to pay Uber trip cancellation fee and the driver is unlikely to cancel the trip either.
The thing now from change from here, it has been always hard to predict if a driver would cancel a ride or not as the conversation of user-driver used to be on a phone call. Since the driver is now more informed about the pick and drop location of the user and is incentives separately for the distance traveled to pick up the user, the focus now shifts from cancellation to driver accepting a ride.
Let’s understand from a Data Science (Machine Learning) perspective what happens behind the scene to match user requests to a driver whose probability to accept the ride is high and get your request fulfilled in the minimum time possible.
Matching Algorithm
In summary, to find the best cab drivers for you — within a few seconds; these ride-hailing companies (Uber, Lyft, Ola, Rapido, etc.) run a matching algorithm and also check a driver’s ride acceptance probability before pushing a request to them.
In this Newsletter, we shall discuss how we can build a driver ride acceptance probabilistic model.
Objective: Predict if a driver will accept ride request or not and find the probability of acceptance?
Characteristics / Features Required
In order to figure out the features required to solve this problem in a ride-hailing business, a data scientist must be well-versed with domain knowledge. Product thinking is always important for a data scientist.
Uber Ride
1. Trip
- Drive to User (Client) Distance (Driver to Pickup Distance)
- Time of the Day (Morning / Afternoon / Evening / Late Night)
- Trip duration/distance (Pick up to Drop Distance)
- Payment method (some drivers prefer Cash Mode more than online payments)
- Destination of the client (drivers don’t like to go to a destination where they will have to struggle to get their next client)
- Ride Type — Pool request or Normal request
2. Driver
a) Enroute or Available
- Whether the driver is available (no rides)
- If the driver is en route — the trip is about to end or in the middle of it
b) Historic Features
- In the last one week/month avg. number of rides accepted rate
- Total trips completed throughout the day — If the driver has achieved his incentive target of the day or not
- Current days acceptance rate (acceptance rate = Number of requests accepted / Total requests received)
3. Vehicle Type
- Cab / Auto Rickshaw / Bike (based on vehicle type some drivers don’t accept client pickups in lonely areas — safety concerns)
- Many Bike / Auto Rickshaw drivers don’t prefer ride requests of > 10kms
Auto Ride
4. Rider (Client)
- Rider Rating (driver doesn’t like to deal with clients below 4 ratings)
- Rider gender (some drivers don’t accept ride requests based on request time and client gender)
- Rider Image (interesting experiment someone did by replacing his image with a zombie image and got less driver acceptance; proving the image of the client to the driver also matters)
Interesting read profile image matters on both (driver and rider) side 😂: ‘Zombie’ drivers are scamming people out of cash with horrible profile pictures
5. Traffic
- Based on request time — drivers sometimes don’t like to accept rides in heavy traffic zones (the duration of the trip to complete or to reach the client — depends on traffic in the region)
6. Special Events (occasional change)
- Weather — Rainy Day / Sunny
- Covid restrictions in that Area (Quarantine zone)
- Festival Day based on region — Holi, Diwali, Christmas, etc.
Rainy Day
Modeling
We now have a rich feature set that can help us predict whether a driver will accept a client’s ride request or not. We use standard statistical machine learning supervised classification algorithms(with spot-checking):
- Logistic Regression (Linear Model)
- Decision Tree (Non-Linear Model)
- Bagging Classifier — Random Forest Classifier (Ensemble Model)
- Boosting Classifier — LightGBM, XGBoost, etc.
Model Metrics : AUC-ROC, F-beta score (beta = 2; if Recall is twice important as Precision)
Conclusion
I hope you understood the business problem and can relate to the features we picked for modeling out the patterns. While there is no silver bullet solution and these problems are way more complex, our aim was to improve the user experience and minimize the user-driver matching time as even a millisecond of change in the driver-user matching algorithm can help save millions of dollars.
According to a paper entitled The Cost of Latency in High-Frequency Trading, a 1-millisecond advantage in latency can be worth upwards of $100 million per year.
I hope you learned something new from this post. If you liked it, hit 👏, subscribe to my newsletter, and share this with others. Stay tuned for the next one!
Connect, Follow or Endorse me on LinkedIn if you found this read useful.
I am open to Gigs or Consults you can reach out to me on LinkedIn: https://www.linkedin.com/in/shaurya-uppal/
Also published here.