01 — Background
Every collection showed the same restaurants, in the same order, to everyone.
Pathao Food organises its restaurant catalogue into curated collections - "Lunch Specials," "Budget Meals," "Best Nearby," and dozens more. These collections are prime discovery real estate: a user who taps "Budget Meals" has already told us something valuable about their intent. They're in a budget-conscious mindset right now.
The problem was that we weren't using any of that context. Every user who opened "Budget Meals" at noon saw the exact same ranked list. The office worker who orders rice-based meals at lunch and the student who prefers wraps saw the same first result. The ranking inside each collection was driven by aggregate restaurant popularity - a global signal that said nothing about the specific person looking at it.
We had a growing library of user order history, meal-time behavioural data, and restaurant feature data. None of it was being used to influence what a user saw first when they opened a collection. This was the gap the Personalised Restaurant Ranking project was built to close.
The brief I set: Make the order of restaurants inside every collection reflect something true about the person opening it - their cuisine preferences, their usual basket size, how they behave at this specific meal time. The best restaurant for Akash at 12:30 PM on a Tuesday should rank higher for Akash than it does for anyone else.
02 — Problem
Four compounding failures in the existing ranking model.
When I broke down why the current ranking wasn't working, I found not one problem but four distinct failures. Any solution that addressed only some of them would leave measurable value on the table.
01
Popularity bias crowded out relevance
The existing ranking was dominated by aggregate order counts - the most ordered-from restaurants in the city floated to the top of every collection, regardless of whether they matched the user's actual preferences. A user who has never ordered pizza would see a popular pizza restaurant ranked above their preferred biryani spot inside "Best Nearby." The most popular answer was rarely the most personally relevant answer.
02
No meal-time context
A user's preferences change by meal time in ways that are highly predictable from their history. The person who orders a light sandwich at breakfast is often the same person who wants a full rice meal at lunch. Showing the same restaurants in the same order at 8 AM and 1 PM treats completely different intent states as identical. The ranking had no time dimension at all - it was static across the entire day.
03
Repeat-order restaurants weren't rewarded
If a user has ordered from a restaurant three times in the past month, that's the clearest signal of preference we have - stronger than any similarity score. But the existing ranking treated a frequently-ordered restaurant the same as one the user had never touched. Demonstrated loyalty had no weight in the ranking. Users had to scroll past unfamiliar options to reach restaurants they already trusted.
04
New users got the worst experience by default
For users with no order history, the ranking had no personal signals to work with at all. The fallback was undefined - which meant the ranking was essentially arbitrary for new users. A new user's first collection-browsing experience was the least curated experience in the app, at exactly the moment when first impressions matter most. There was no explicit fallback logic, no graceful degradation.
03 — My Approach
Five decisions that shaped the product before a line was written.
Before writing functional requirements, I made five strategic calls. Each one had trade-offs and I want to be direct about how I reasoned through them.
01
Build per-slot taste profiles, not a single user profile
The naive approach would have been to build one aggregate preference profile per user. I pushed for five separate profiles - one per meal slot (breakfast, lunch, snacks, dinner, late night) - based on the insight that the same user behaves very differently at different meal times. A single profile would blur these signals together, making the ranking less accurate than the history alone. The DS engineering cost was real, but the relevance improvement justified it.
02
Use similarity scoring, not just history replay
Pure history replay - "show them restaurants they've ordered from" - is too narrow. It doesn't handle restaurants the user has never tried but would likely enjoy. I specified a user-restaurant similarity score built from four features: cuisine match, basket size range, delivery speed preference, and minimum rating threshold. This is what makes the ranking generalise beyond familiar ground.
03
Give repeat-order restaurants a hard priority boost
I wrote an explicit rule: if a user has ordered from a restaurant more than once, that restaurant receives elevated priority in the ranking - above its similarity score alone. Revealed preference through repeat orders is the strongest behavioural signal we have. No amount of similarity modelling can produce a signal as clean as "this user chose this restaurant again." This was a non-negotiable design decision.
04
Specify the fallback as a first-class requirement
Rather than treating new users as a corner case, I specified an explicit fallback path: popularity-sorted by total orders from the last 6 months, with random tie-breaking. This gave new users a sensible, well-defined experience from day one - not an undefined arbitrary list. Graceful degradation was written into the spec at the same priority level as the personalisation logic itself.
05
Ship V1 without a frontend change - ranking only
I made a deliberate scoping call: V1 would change only the sort order of restaurants inside collections. No UI changes, no new surfaces, no additional features. This let us get real-world ranking quality data before committing to a broader product investment. It also made V1 easier to instrument and attribute - if metrics moved, they moved because of the ranking algorithm, full stop.
04 — V1 Design
Building the ranking foundation - profiles, similarity scores, and the sort logic.
V1 comprised six functional requirements, all classified P00 (critical to have). None were optional - each was a necessary piece of the ranking pipeline. Here is what I specified and why.
The five meal-time slots
The first design decision was how to define meal slots. I specified five, balancing granularity against practical data density:
☀️
Breakfast
6 AM – 10 AM
🌙
Late Night
10 PM – 6 AM
The six V1 functional requirements
1
P00 · Critical
Generate user food preference profiles per time slot
Build and store a preference profile for each user × meal slot combination using order history from the last 6 months. Profile captures: preferred cuisine tags, average basket size, average delivery speed preference, and minimum restaurant rating accepted.
Acceptance criteria
- Profiles must be generated and stored separately for each of the 5 meal slots
- If a user has no order data for a specific slot, fall back to combined history from all other slots
- Profile attributes: cuisine tags, avg. basket size, avg. delivery speed, min. rating threshold
2
P00 · Critical
Map restaurant features for similarity comparison
Generate a feature vector for every restaurant from the last 6 months of activity - the same four dimensions used in user profiles - so that user-restaurant similarity can be computed on a like-for-like basis.
Acceptance criteria
- Every restaurant must have a scored feature vector covering cuisine types, avg. basket size, avg. delivery time, and rating
- Feature vectors refreshed on a rolling 6-month window
3
P00 · Critical
Calculate user–restaurant similarity score per time slot
For the active meal slot, compute a similarity score between the current user's slot profile and every restaurant's feature vector. This score drives the ranking order inside each collection.
Acceptance criteria
- Score must incorporate: cuisine match, delivery proximity, basket size range match, rating threshold, order frequency
- Score recalculated dynamically based on current time-of-day slot
- Tie-breaking: if two restaurants have identical scores, one is selected randomly - no deterministic tie-break that could create a persistent bias
4
P00 · Critical
Prioritise restaurants ordered multiple times by the user
Apply a hard priority boost to restaurants a user has ordered from more than once - elevating them above their base similarity score. Repeat orders are the strongest behavioural signal available; they must outrank any similarity-only recommendation.
Why this matters
- Similarity scores are estimates; repeat orders are confirmed preference
- Users should not have to scroll past unfamiliar restaurants to reach their regulars
5
P00 · Critical
Sort restaurants inside each collection by similarity score
The similarity scores computed above drive the visible ranking inside every collection. High-to-low similarity = top-to-bottom in the list. The sort updates dynamically as the user's meal slot changes through the day.
Acceptance criteria
- Visible restaurant order inside any collection must reflect similarity score descending
- Ranking updates automatically when time crosses a slot boundary
- No frontend card format changes required - ranking change only
6
P00 · Critical
Handle users with no or insufficient order history
For new users or low-activity users where profile generation is not viable, fall back gracefully to a popularity-sorted list - defined as restaurants ranked by total orders in the last 6 months within the relevant collection.
Acceptance criteria
- Fallback triggered when user has insufficient history for reliable slot profile generation
- Fallback ranking = order count from last 6 months (descending)
- Tie in popularity: random selection - no persistent ordering bias
- Fallback degrades silently - no user-facing indicator that personalisation is absent
A concrete example
Akash opens "Lunch Specials" at 1:15 PM on a weekday
→
Slot detection: 1:15 PM falls in the Lunch slot. System retrieves Akash's lunch preference profile: biryani/kebab dominant, mid-basket size, avg. delivery 25 min, min. rating 4.0.
→
Similarity scoring: Every restaurant in "Lunch Specials" is scored against Akash's lunch profile. "Dhaka Biryani House" (cuisine match, rating 4.3, similar basket) scores 0.87. A popular but cuisine-mismatched restaurant scores 0.41.
→
Repeat-order boost: Akash has ordered from "Spice N Rice" 4 times. It receives priority elevation, placing it above its raw similarity score of 0.79.
→
Final ranking: Spice N Rice (repeat-order boost) → Dhaka Biryani House (0.87) → other high-similarity matches → low-similarity restaurants at the bottom.
→
Result: Akash sees his trusted spot first, with relevant discovery options immediately below - not after scrolling past ten popularity-driven but personally irrelevant results.
05 — V2 Design
Extending the algorithm - additional signals for a richer ranking model.
V1 established the core ranking infrastructure: slot profiles, similarity scoring, repeat-order boosting, and the fallback path. V1's post-launch data surfaced a clear gap: the ranking still defaulted heavily toward high-scoring-but-familiar restaurants, with insufficient weight for contextual factors like active promotions and real-time restaurant availability. V2 extended the algorithm with these signals.
V1 Limitation
Algorithm relied solely on preference and history signals
V1's ranking combined similarity scores and repeat-order boosts - both of which are backward-looking signals derived from historical behaviour. They said nothing about what was actively happening on the platform right now: which restaurants had live promotions, which had recently improved their delivery performance, which were experiencing a surge in orders. A restaurant that was offering 20% off was ranked identically to one that wasn't.
V2 Extension
Add popularity momentum and discount signals
Popularity momentum: Incorporate a recency-weighted order velocity signal - restaurants trending upward in recent orders receive a ranking boost, capturing real-time demand signals that history alone misses.
Discount signal: Restaurants with active promotions receive a contextual boost in the ranking - surfacing deals to users at the moment they're deciding what to order, rather than requiring them to browse separately to a deals section.
Blended scoring: V2 combines the V1 similarity-based rank with the new signals into a weighted composite score, preserving the personalisation benefits of V1 while adding real-time context.
Why I didn't add these in V1: Both signals required additional data pipelines - real-time order velocity tracking and a live promotions API feed - that weren't available when V1 shipped. Rather than delay V1 to build these dependencies, I scoped them explicitly as V2 requirements with a clear handoff once V1 data confirmed the core ranking model was working. V1 gave us the proof of concept; V2 gave us the full model.
06 — Metrics
What success looked like - and the full framework I used to measure it.
North Star: Reduction in time from collection open to checkout initiation - the clearest signal that a user found what they were looking for faster, directly attributable to ranking quality.
| Layer |
Metric |
Why it's on the dashboard |
| Engagement |
CTR on top-3 ranked restaurants in collection |
Directly measures whether the ranking is surfacing relevant restaurants at the positions users actually look at. If CTR on position 1–3 increases, personalisation is working at the top of the list where it matters most. |
| Engagement |
Scroll depth before first tap within collection |
Good personalisation should reduce how far users scroll before finding something they want. Decreasing scroll depth = the right restaurant appeared earlier in the ranked list. |
| Speed |
Time from collection open to checkout initiation |
North Star metric. Captures the full browsing-to-decision journey. A shorter time means less cognitive load - the user didn't have to work hard to find a relevant option. |
| Speed |
Time from collection open to order request |
Extends the north star to full conversion. Collection open → order placed is the complete funnel, and ranking quality should compress it. |
| Conversion |
Order conversion rate from collection pages |
Guardrail metric. Personalisation should increase conversion, not just speed. If CVR stays flat while time decreases, users are finding options faster but still not satisfied with them - a signal to revisit ranking quality. |
| Conversion |
Repeat-order rate on ranked restaurants |
Measures whether users are returning to restaurants they discovered via the ranked collection - confirmation that the recommendations generated lasting preference, not just one-off clicks. |
| System |
Profile coverage - % of active users with slot profiles |
Operational health check. If coverage is low, many users are being served the fallback popularity ranking. Tracking this ensures the personalisation pipeline is running and coverage is expanding as order history accumulates. |
| System |
Fallback trigger rate per slot |
Shows which meal slots have insufficient data density. A persistently high fallback rate for "Breakfast" tells us breakfast ordering behaviour is thin - actionable input for the DS team on data collection priorities. |
What I deliberately excluded
Session length. A ranking system that works should reduce session length, not increase it. If users are spending more time browsing after personalisation launched, the ranking may be showing them more options without making the right one obvious - optimising for session length would reward the wrong outcome.
Total page impressions. High impressions with no CTR increase means users are looking but not engaging. Tracking impressions as a success metric would obscure whether personalisation was actually creating relevance or just attracting passive attention.
Guardrail metric: Conversion rate was tracked as a guardrail, not a primary success metric. The goal was to improve speed-to-decision; conversion was expected to follow. If conversion dropped while time-to-checkout improved, that would be a signal to investigate - it shouldn't happen, and tracking it as a guardrail means we'd catch it immediately if it did.
07 — Risks
What could go wrong - and how I designed against it.
High
Sparse slot profiles for low-frequency meal times produce irrelevant rankings
Not every user orders breakfast or late-night meals regularly. A profile built on 2 breakfast orders in 6 months is unreliable. If this profile drives the ranking, users get personalisation that's worse than the fallback. Mitigated by specifying that slots with insufficient order density fall back to the user's combined cross-slot history - a richer signal than an unreliable sparse profile.
High
Repeat-order boost creates a filter bubble over time
If frequently-ordered restaurants are always ranked first, users may never see anything new in a collection they open regularly. The best-match result becomes increasingly identical across sessions. Mitigated by capping the repeat-order boost - it elevates but doesn't monopolise the top positions - so discovery candidates always appear in the visible ranking.
High
Stale profiles cause ranking quality to degrade silently
If the DS pipeline stops refreshing slot profiles, users receive rankings based on months-old behaviour that no longer reflects their preferences. This fails silently - there's no visible error, just increasingly irrelevant results. Mitigated by the "profile coverage" and "fallback trigger rate" metrics, which would show anomalies before users report quality issues.
Medium
Similarity model penalises restaurants in mismatched but relevant collections
A user who rarely orders pizza but opens "Pizza Specials" explicitly is signalling current intent that overrides their historical profile. A strict similarity score might rank their profile's cuisine preferences above the collection's explicit theme. Mitigated by ensuring the similarity model scores within the collection's restaurant set - not globally - so collection context constrains the ranking appropriately.
Medium
V2 discount signal inflates ranking for promoted restaurants at the expense of relevance
If the discount signal is weighted too heavily, collections become effectively a promotions list - ranking restaurants by who's offering deals rather than who's actually relevant. This damages trust in the personalisation. Mitigated by treating the discount signal as a tiebreaker or secondary boost, with similarity score remaining the dominant ranking factor.
Low
Random tie-breaking creates inconsistent experience across sessions
When two restaurants have identical similarity scores, the random tie-break means the same user opening the same collection twice might see different orderings. For most users this is imperceptible, but for power users who notice, it could feel inconsistent. Accepted as a deliberate trade-off over deterministic tie-breaking, which would create a persistent ordering bias that advantages specific restaurants unfairly.
08 — Lessons
What building this taught me about ranking systems and PM craft.
01
Context-awareness is not a feature - it's a prerequisite for relevance
The biggest insight from this project was that a single user profile produces mediocre personalisation. The same person has genuinely different preferences at breakfast and dinner, and treating them as one static entity produces recommendations that are half-wrong all the time. Building five slot profiles was more complex than one aggregate profile, but the ranking quality gain was not marginal - it was structural. Any personalisation system that ignores context is working with one hand tied.
02
Demonstrated preference beats modelled preference every time
The repeat-order priority rule was the simplest thing in the spec and arguably the most impactful. No similarity score is more reliable than "this user chose this restaurant again." The lesson I took from this: when designing any recommendation or ranking system, identify the clearest revealed-preference signals first. Build the complex model second. The simple rule often does more work than the model.
03
The fallback IS the product for a material percentage of users
In a two-sided marketplace like Pathao Food, new users are constantly arriving. During launch and for weeks afterward, a significant share of users would hit the fallback path. I specified the fallback at P00 priority - the same as the core personalisation logic - because a poorly designed fallback affects real users at real scale. The lesson: never design a fallback as an afterthought. Design it as deliberately as the main path.
04
Phasing by data dependency, not by feature scope, gave us cleaner attribution
I could have tried to include V2's popularity and discount signals in the initial release. The reason I didn't was pipeline readiness, not feature desirability. But the side effect was valuable: V1's clean launch gave us unambiguous attribution - any metric movement was caused by the similarity ranking, full stop. When V2 launched, we could compare against V1's baseline rather than trying to untangle multiple simultaneous changes. Phasing by dependency produced better analytical clarity than scope alone would have.
05
Speed-to-decision is the right north star for a ranking feature
I chose time-to-checkout as the north star rather than CTR or conversion rate. This was intentional. A ranking feature that works should make decisions easier and faster - the user shouldn't need to scroll as far or think as hard. CTR and conversion are downstream of that. If time-to-checkout decreases, CTR and conversion should follow. Choosing the more direct signal over the more familiar one forced cleaner thinking about what the feature was actually trying to do.