AI Hotel Memory 2026:What does a chatbot remember about hotels — with the web turned off?
Every other study here measures what AI recommends after it searches the web. This one measures the opposite: what it has actually memorised. We turned web search off and asked three cheap models to name hotels — and to give each one’s website. Verifying those websites turns recall into a confabulation lie-detector.
TL;DR. Hotel chains are known cold — every model names Marriott, Hilton, Four Seasons and returns their correct website ~99% of the time. For individual hotelsit cracks: models confidently return a dead or wrong website 3% to 53% of the time, worst in Paris. The cheapest model tested (Gemini 3.1 Flash-Lite) had the best memory. And the failure mode is unsettling — the model knows the hotel exists, then invents its web address.
People increasingly ask chatbots for hotels. Usually the model searches the web first — so its answer reflects what’s online, not what it knows. We removed that crutch. With tools disabled, an LLM can only answer from the patterns baked into its weights during training: its parametric memory. So we asked, repeatedly: name hotels you know — in JSON, with each hotel’s website and address. Because a website is checkable, we can do something a pure recall test can’t: measure whether the memory is correct, not just present.
The result is a clean gradient. Chains live in every model’s memory perfectly. Famous palace hotels mostly do too. But the long tail of real, individual hotels is where models start to confabulate — and they do it with total confidence, returning a tidy JSON record for a hotel whose website doesn’t exist.
1. The experiment
The design borrows from Dejan Marketing’s AI Brand Authority Index, which ranks brands by how often a model names them unprompted. We adapt it to hotels and add one twist that fixes Dejan’s biggest limitation — he could only normalise raw name strings, never verify them.
- Web search OFF. No tools, no retrieval. Pure memory. (In normal use, ChatGPT searches the web — we deliberately don’t.)
- Three cheap models: GPT-5.4-nano, GPT-5.4-mini, and Google’s Gemini 3.1 Flash-Lite.
- Two cuts: “name hotel chains” (global), and “name hotels in {city}” for Paris, Dubai, London and New York.
- JSON output: each run returns
{name, website, address}. The website becomes the hotel’s identity — no fuzzy matching against a database needed. - Verification: we check whether each returned domain actually resolves. A live domain ≈ real memory; a dead one ≈ confabulation.
- ~150 runs per model for chains, ~80 per model per city, temperature 1.0, repeated to surface what’s consistently top-of-mind.
2. Chains are known cold
The base layer of hotel memory is rock-solid. Asked for hotel chains, every model returns the majors — and their correct websites — essentially every time. GPT-5.4-mini hit a 99% live-website rate on chains; even nano managed 88%. Chains appear in training data on a single, predictable domain millions of times, so the association is overlearned.
| Rank | Chain | Recalled | Avg rank | Website (verified) |
|---|---|---|---|---|
| #1 | Marriott Hotels | 100% | 19.1 | marriott.com |
| #2 | Hilton Hotels & Resorts | 100% | 16 | hilton.com |
| #3 | Hyatt Regency | 100% | 9.6 | hyatt.com |
| #4 | InterContinental Hotels & Resorts | 100% | 12.7 | ihg.com |
| #5 | Best Western | 100% | 22.6 | bestwestern.com |
| #6 | Radisson Blu | 99% | 23.7 | radissonhotels.com |
| #7 | Ritz-Carlton | 98% | 15.7 | ritzcarlton.com |
| #8 | Quality Inn | 93% | 25.1 | choicehotels.com |
| #9 | Four Seasons Hotels and Resorts | 91% | 16.2 | fourseasons.com |
| #10 | Super 8 | 85% | 27.4 | wyndhamhotels.com |
3. What AI remembers, city by city
Drop to the individual-hotel level and a city’s “memory leaderboard” emerges — the properties a model names again and again, unprompted. These are the hotels with the strongest grip on the model’s mind. Tables below are GPT-5.4-mini; “Recalled” is the share of 80 runs that named the hotel.
Paris
| Rank | Hotel | Recalled | Avg rank | Website (verified) |
|---|---|---|---|---|
| #1 | Hôtel de Crillon, A Rosewood Hotel | 100% | 6.3 | rosewoodhotels.com |
| #2 | Four Seasons Hotel George V | 100% | 3.8 | fourseasons.com |
| #3 | Mandarin Oriental, Paris | 100% | 6.6 | mandarinoriental.com |
| #4 | Shangri-La Paris | 100% | 5.1 | shangri-la.com |
| #5 | Hôtel Plaza Athénée | 100% | 6.1 | dorchestercollection.com |
| #6 | The Ritz Paris | 96% | 3.3 | ritzparis.com |
| #7 | Hôtel Molitor Paris - MGallery | 83% | 20.3 | all.accor.com |
| #8 | Le Bristol Paris | 81% | 5.6 | oetkercollection.com |
Dubai
| Rank | Hotel | Recalled | Avg rank | Website (verified) |
|---|---|---|---|---|
| #1 | Burj Al Arab Jumeirah | 100% | 8.2 | jumeirah.com |
| #2 | Atlantis, The Palm | 100% | 2.9 | atlantis.com |
| #3 | Address Downtown | 100% | 11.3 | addresshotels.com |
| #4 | The Ritz-Carlton, Dubai | 100% | 10 | ritzcarlton.com |
| #5 | W Dubai - The Palm | 100% | 18.5 | marriott.com |
| #6 | Hilton Dubai The Walk | 99% | 22.2 | hilton.com |
| #7 | One&Only The Palm | 95% | 11.4 | oneandonlyresorts.com |
| #8 | Raffles Dubai | 95% | 16.3 | raffles.com |
London
| Rank | Hotel | Recalled | Avg rank | Website (verified) |
|---|---|---|---|---|
| #1 | The Savoy | 100% | 1.6 | thesavoylondon.com |
| #2 | The Dorchester | 100% | 5.3 | dorchestercollection.com |
| #3 | Shangri-La The Shard, London | 100% | 8.2 | shangri-la.com |
| #4 | The Ned | 100% | 14.5 | thened.com |
| #5 | Rosewood London | 100% | 8.3 | rosewoodhotels.com |
| #6 | The Ritz London | 99% | 2.6 | theritzlondon.com |
| #7 | Claridge's | 99% | 3 | claridges.co.uk |
| #8 | The Langham, London | 99% | 5.5 | langhamhotels.com |
New York
| Rank | Hotel | Recalled | Avg rank | Website (verified) |
|---|---|---|---|---|
| #1 | The Plaza Hotel | 100% | 1 | theplazany.com |
| #2 | The St. Regis New York | 100% | 13.4 | marriott.com |
| #3 | Park Hyatt New York | 100% | 16 | hyatt.com |
| #4 | Conrad New York Downtown | 100% | 12.2 | hilton.com |
| #5 | The Langham, New York, Fifth Avenue | 99% | 8 | langhamhotels.com |
| #6 | Mandarin Oriental, New York | 98% | 5.8 | mandarinoriental.com |
| #7 | Four Seasons Hotel New York Downtown | 98% | 5.3 | fourseasons.com |
| #8 | 1 Hotel Central Park | 98% | 14.9 | 1hotels.com |
4. The website lie-detector
Here is the headline. For each model and place, what share of the hotels it named came with a website that actually resolves? Chains: near-perfect. Individual hotels: a different story — and it gets worse the weaker the model and the more independent the city.
| Model | Global chains | Paris | Dubai | London | New York |
|---|---|---|---|---|---|
| Gemini 3.1 Flash-Lite | 92% | 71% | 97% | 96% | 89% |
| GPT-5.4-mini | 99% | 58% | 94% | 92% | 82% |
| GPT-5.4-nano | 88% | 47% | 77% | 67% | 68% |
The failure mode is the interesting part. The model usually knows the hotel exists — it just fabricates the web address, often a clean, plausible guess that happens to be dead:
| Hotel (real) | What nano invented | The real website |
|---|---|---|
| Le Bristol Paris· Paris | bristolparis.com ✗ dead | oetkercollection.com ✓ |
| Hôtel Plaza Athénée· Paris | plazaathenee-paris.com ✗ dead | dorchestercollection.com ✓ |
| The Ritz Paris· Paris | theritzparis.com ✗ dead | ritzparis.com ✓ |
| Pod Times Square· New York | pod-hotels.com ✗ dead | thepodhotel.com ✓ |
bristolparis.com for Le Bristol Paris — a dead domain. The hotel is one of the most famous in the world; its real site isoetkercollection.com. The model didn’t fail to recall the hotel — it recalled the hotel and hallucinated the URL. That is exactly the kind of confident-but-wrong detail that slips past a reader.5. Why Paris is the hardest city
Paris is the worst-remembered of the four cities for every model (down to 47% live-website rate on nano), and it also produces the longest tail of invented names — nano emitted 469 distinct “Paris hotels” across 80 runs, versus a tight 177 for Gemini. The reason is structural: Paris’s top hotels are independent palaces on collection domains the model can’t predict — Le Bristol on oetkercollection.com, Plaza Athénée on dorchestercollection.com. A chain trains the model on one obvious domain; an independent does not, so the model guesses — and misses.
A tighter set with more live websites (Gemini) signals a sharper, more reliable memory; a sprawling set with dead domains (nano) signals a model padding its answer with invention.
6. The cheap-model surprise
The counterintuitive finding: the cheapest model had the best hotel memory. Google’s Gemini flash-lite returned the highest share of working websites in cities — 97% in Dubai, 96% in London— beating the pricier GPT-5.4-mini and far ahead of GPT-5.4-nano. It’s not about price; it’s about how much specific, rare detail a model retains. This is also why Dejan’s original brand index could run on cheap Gemini at all: that family punches above its cost on real-world entity recall. Two cheap models, two very different memories.
7. What it means for hotels
- If you’re a chain or a famous palace, the model knows you — name and website. You’re in the memory layer, not just the search layer.
- If you’re an independent or boutique hotel, the model may know your name but invent your web address. When a model answers from memory (or a user copies its output), that’s a wrong link pointing away from you.
- Memory ≠ retrieval. With web search on, models ground their answers and this mostly disappears. But the memory layer still shapes which hotels a model reaches for first, and what it “believes” before it searches.
- The fix is the same as the rest of AI visibility: a consistent, well-linked web presence is what turns a hotel from a fuzzy memory into a correctly-remembered one.
Methodology
Models: gpt-5.4-nano, gpt-5.4-mini (OpenAI), gemini-flash-lite-latest — which resolved to Gemini 3.1 Flash-Lite (Google), called with no tools and temperature 1.0. Runs: ~150 per model for the chains cut; ~80 per model for each of Paris, Dubai, London, New York. Output: JSON, requesting name + website + address per hotel.
Identity & scoring: hotels are keyed by their returned website domain (falling back to a normalised name). “Recalled” is the share of runs that named the hotel; “avg rank” is its mean position in the list. Website verification: each distinct domain is checked for DNS resolution. A resolving domain is treated as a (conservative) signal of real memory; a non-resolving one as confabulation.
Caveats: DNS-resolves is a lower bound on accuracy — a domain can resolve yet still be the wrong hotel, so true confabulation is somewhat higher than reported. This measures the memory of three specific cheap models, not the full ChatGPT or Gemini consumer products (which use larger models and web search). Cost of the entire study was under €20.