How to Prompt-Track Your Hotel’s AI Visibility
Ask ChatGPT for the best boutique hotel in your city five times and you get five different lists. Most hoteliers see that and conclude AI visibility is unmeasurable noise. After studying it across dozens of markets, I disagree.
The variance is real, but it is structured — by city, by guest, by how the question is asked. Once you measure around that structure instead of fighting it, your hotel’s AI visibility becomes a number you can track, move, and defend. Here is the method I run on my own properties every week.
Random, or just structured?
For twenty years we measured search with rank trackers: one keyword, one position, checked daily. It worked because Google’s results were stable enough to pin down. Most hotels now try to measure AI the same way — one prompt, run once, scored as in or out — and conclude it is hopeless.
They are half right. A single AI answer really is close to a coin flip. But “variable” is not the same as “random.” When I measured hotel recommendations across dozens of city-and-tier combinations, the instability was not noise — it was a pattern. Constrained queries (“family hotels in Berlin”) were remarkably stable. Broad, competitive ones (“boutique hotels in Paris”) wobbled hard. Same engine, same week — the structure was in the question, not the dice.
Why one run lies (and why it is still measurable)
These numbers are from my own hotel rankings-consistency study — the same prompt, repeated, across many markets:
Read those together and the conclusion writes itself. One run tells you almost nothing. Monthly tracking watches a surface that has already turned over since you last looked. And any single “AI visibility score” that blends engines is an average of things that should never have been averaged.
But the spread between 17% and 96% is the hopeful part: the variance is governed by things you control or can hold fixed — the city, the guest persona, the phrasing, the engine, the country you ask from. Pin those down, repeat the question, and the noise collapses into a signal. The rest of this guide is how.
Where prompt tracking goes wrong
Most prompt-tracking setups make the same handful of mistakes. Each one is fixable, and the fixes are the six steps that follow.
A single answer is close to a coin flip. In my consistency study, asking for a city’s best hotels twice returned on average only one of the same top three. Score one run and you are recording a flip, not your visibility.
Prompts reverse-engineered from your amenity list (“Paris hotel with a smart-trainer studio”) only ever flatter your own metric. Real guests ask from a need (“I’m a cyclist visiting Paris…”). Track the question, not the brochure.
Each engine grounds hotel answers on different sources — review sites, Google Maps, its own retrieval. Averaging ChatGPT, Perplexity and Gemini into one number is like averaging your Google rank with your Bing rank. It hides the lever.
Engines refresh which sources they cite constantly. A monthly check is a fast-moving bank account looked at once a quarter: by the time you read it, the movement you caused is already gone.
The opening question tells you if you are mentioned. It says nothing about surviving the follow-ups — alternatives, price, “is it actually good for X?” For a hotel, the booking journey is the unit of measurement, not the first reply.
Build a persona × location panel
A panel is a fixed set of prompts you run forever, so movement means something. I build hotel panels from the two dimensions a real guest varies: who they are and how wide they cast the net.
Personas. Phrase each prompt the way a specific guest asks, with their need as the reason your hotel comes up — never your amenity list as keywords. “I’m a keen cyclist visiting Paris, any hotels set up for that?” is a persona prompt. “Paris hotel with a smart-trainer studio” is keyword-stuffing that only flatters your own metric. A workable controlled set: couples, solo leisure, families, solo business, luxury — plus the niche you are genuinely the best answer for (a cyclist, a dog owner, a wellness traveller).
Locations. Guests zoom in and out. Track a nearzone (the walkable neighbourhood — “Le Marais,” “the Gothic Quarter”) and a wide zone (“central Paris”). Your hotel is a strong match for both, against very different competitive sets — and, as the consistency data shows, the narrower one is far easier to win and to measure.
Cross the two — a handful of prompts per persona × location bucket — and you land around 40–60 discovery prompts that read like real planning questions. Add a small branded tier (“Tell me about Hotel X”) purely as a sanity and data-quality check. Then freeze the set and version it: only add prompts in batches, never edit an old one, or you lose week-over-week comparability.
Repeat every prompt, and keep the engines apart
This is the single change that turns noise into signal. Run every prompt several times — five is a sensible floor — per engine, every week, and keep ChatGPT, Perplexity, Gemini and Google AI as separate lines. Repeats average out the run-to-run wobble; weekly matches how fast the engines churn their sources; separate lines stop one strong engine from hiding four weak ones.
Two settings move the answer as much as the wording does, so fix them and write them down:
- Country / IP. A French IP and a Japanese IP get different hotel answers. Decide whose view you are measuring — your home market plus your top one or two origin markets — and route from those countries deliberately, rather than letting a tool pick at random.
- Clean, logged-out sessions. Personalised history skews the result. Measure the cold-start answer a new guest sees, and treat it as a floor, not the whole truth.
By hand this does not scale past one hotel. At volume you trigger the prompts through a scraping layer that captures the real chatgpt.com answer surface from a chosen country — which is exactly what my own pipeline does behind the scenes.
Report rates, not booleans
Once each prompt has run several times, stop reporting yes/no. Report a rate: appeared in 3 of 5 runs is a 60% mention rate. With only five runs the margin is wide, so the rule is simple — do not trust a single week’s number, trust the trend across three or more weeks. One green week is weather; three climbing weeks is a result.
Record five things on every run, not just “were we there”:
- Mentioned — your name appears in the answer text.
- Cited — your own website is linked (and at what position).
- Sentiment — are you the recommendation or the warning?
- Attributes — the words attached to you: “quiet,” “great location,” “pricey,” “dog-friendly.”
- Co-cited domains — every other site the answer leaned on (more on that below).
When you score zero, prove the zero is real
A hotel that never appears looks invisible. But a zero can be an artefact — a weak prompt, run-to-run variance, or the wrong country — rather than genuine absence. Before you accept “the AI has never heard of us,” climb a short ladder. This is the part I find hoteliers skip, and it has saved me from the wrong verdict more than once.
Did that prompt return any real hotels and trigger a web search at all? If it returns junk for everyone, the prompt is broken — not your hotel invisible.
Re-run the prompt ten times. Appear in even one? Then you are not invisible, just intermittent — a variance problem, not an absence.
Widen the persona and phrasing set for that city. A single phrasing can simply miss you.
Ask directly: “Is Hotel X any good?”, “Tell me about Hotel X.” If you only surface when named, the engine knows you but does not recommend you — a very different problem to fix.
Re-run from your own country’s IP instead of a default US one. Local guests may see you when a foreign default does not.
You are truly invisible only if you stay at zero all the way up the ladder. Otherwise, record the level that surfaced you — “invisible to default prompts, appears only when named from a French IP” is a precise, fixable diagnosis, not a shrug.
Measure the booking journey, not just turn one
Guests do not ask one question and book. They have a conversation, and AI is built for exactly that. A flat prompt list only ever measures the opening line. Take your highest-intent prompts and run them as a short journey, as one conversation, scoring every turn:
The metric a one-shot tracker can never see is persistence: are you still in the answer at Selection if you were there at Exploration? Surfacing once near the top and vanishing by the booking turn is a very different outcome from carrying all the way through — and only the journey view tells them apart.
Track which sources each engine trusts
This is where measurement turns into a to-do list. Record the full set of domains every answer cites, and a pattern appears fast: each engine grounds hotel answers on different places. One leans on review and comparison sites, another on your Google Business Profile and Maps, another on its own retrieval of your site and blog.
That co-citation list is the most useful by-product of the whole exercise. It is a leaderboard of who gets cited instead of you — the OTAs, TripAdvisor, the listicles — and it tells you which lever to pull per engine: review velocity and comparison content to win one, schema and Google Business Profile to win another, your own structured pages to win a third. A flat “you rank 6th” never tells you that.
A version you can run this week
You do not need a pipeline to start. You need discipline and a spreadsheet.
- 1Write 12 prompts
Two branded, six persona (the guests you genuinely serve), two neighbourhood, two generic. Freeze them.
- 2Pick your engines
ChatGPT and one of Perplexity or Gemini to start. Separate columns, always.
- 3Run each 5×, logged out
Same day each week. Note the country if you switch it. Yes, it is tedious; that is the point.
- 4Score five fields
Mentioned, cited, position, sentiment, co-cited domains — per run.
- 5Report rates, not booleans
Mention rate per prompt per engine. Watch the trend over three weeks before you believe anything.
- 6Change one thing, then watch
Adjust one item on your site, log the date, and see if the line moves over the next runs. Never claim cause — show timing.
Want the engines, the fixes, and the data first?
This guide is the measurement layer. For how AI actually picks hotels and what to change, start with the complete guide.
AI Search for Hotels: the complete guideFAQ
Further reading
This guide is the measurement layer of a larger body of hotel AI-search work. The variance figures above come from my own rankings-consistency study; the source-grounding split runs through the whole research library.
AI Rankings Consistency Study
How much hotel recommendations actually wobble across repeated runs — and where they do not.
How to Measure AI Hotel Traffic
The traffic-side mirror of this guide — GA4, referrals, branded search.
AI Search for Hotels (Guide)
The full GEO playbook: how the engines pick hotels and what to fix.
AI Hotel Landscape 2026
Which sources AI trusts most for hotels — the grounding leaderboard.