A hospitality and travel technology company
−35%
Support tickets
60%
Queries resolved autonomously
< 2s
First-token latency
4.4★
Feature rating
Overview
A hospitality technology company wanted to reduce customer support load by adding an AI booking assistant to their existing iOS app. I integrated a server-side LLM pipeline, designed the conversational UX with streaming responses, and built graceful fallback flows for when the model was unavailable — reducing support tickets by 35%.
The challenge
The iOS app was handling a high volume of booking-related queries through in-app chat that routed to human agents. Most queries were repetitive: availability checks, cancellation policies, room type differences. The team wanted an AI assistant that could handle tier-1 queries autonomously, with a clean handoff to a human agent for complex cases — and it needed to work reliably on hotel WiFi.
The approach
Designed the conversation flow architecture first: query classification, confidence thresholds for autonomous response vs. human handoff, and explicit 'I don't know' handling to prevent hallucinated room availability.
Integrated the server-side LLM API (OpenAI) with streaming response support — implementing AsyncSequence parsing to render tokens progressively, reducing perceived latency from ~3s to <2s first-token.
Built an offline-aware conversation queue: messages composed without connectivity are queued locally and sent on reconnect, with a clear offline indicator — critical for hotel WiFi environments.
Implemented a graceful degradation path: if the LLM API is unavailable or response confidence is below threshold, the assistant routes to a human agent with full conversation context pre-loaded.
Added structured logging to track: query type distribution, model confidence, handoff rate, and resolution rate — giving the product team the data to tune prompts post-launch.
The outcome
Tier-1 support ticket volume dropped 35% within 8 weeks of launch. The AI assistant handles approximately 60% of booking queries autonomously. Handoff-to-human rate stabilised at ~18% — well within the product team's target. Feature rating in App Store reviews: 4.4 stars. First-token latency: <2s on standard hotel WiFi.
−35%
Support tickets
60%
Queries resolved autonomously
< 2s
First-token latency
4.4★
Feature rating
Tech stack