Measurement Guide · June 2026

How to Prompt-Track Your Hotel’s AI Visibility

Ask ChatGPT for the best boutique hotel in your city five times and you get five different lists. Most hoteliers see that and conclude AI visibility is unmeasurable noise. After studying it across dozens of markets, I disagree.

The variance is real, but it is structured — by city, by guest, by how the question is asked. Once you measure around that structure instead of fighting it, your hotel’s AI visibility becomes a number you can track, move, and defend. Here is the method I run on my own properties every week.

~1 of 3
top-3 hotels repeat between runs
17–96%
position-1 stability by market
6 steps
to a defensible number

Random, or just structured?

For twenty years we measured search with rank trackers: one keyword, one position, checked daily. It worked because Google’s results were stable enough to pin down. Most hotels now try to measure AI the same way — one prompt, run once, scored as in or out — and conclude it is hopeless.

They are half right. A single AI answer really is close to a coin flip. But “variable” is not the same as “random.” When I measured hotel recommendations across dozens of city-and-tier combinations, the instability was not noise — it was a pattern. Constrained queries (“family hotels in Berlin”) were remarkably stable. Broad, competitive ones (“boutique hotels in Paris”) wobbled hard. Same engine, same week — the structure was in the question, not the dice.

I run this every week on the hotels I build — Hotel Ranque among them — and the first time I watched one prompt return five different lists, I nearly binned the whole idea. The fix was not a cleverer prompt. It was measuring like you would measure anything noisy: fix the conditions, repeat, and read the average — not the single roll.

Why one run lies (and why it is still measurable)

These numbers are from my own hotel rankings-consistency study — the same prompt, repeated, across many markets:

~1.1 / 3
top-3 overlap
hotels repeating between two runs
17%
least stable market
position-1, heavy competition
96%
most stable market
position-1, constrained query
weekly
source churn
what engines cite keeps changing

Read those together and the conclusion writes itself. One run tells you almost nothing. Monthly tracking watches a surface that has already turned over since you last looked. And any single “AI visibility score” that blends engines is an average of things that should never have been averaged.

But the spread between 17% and 96% is the hopeful part: the variance is governed by things you control or can hold fixed — the city, the guest persona, the phrasing, the engine, the country you ask from. Pin those down, repeat the question, and the noise collapses into a signal. The rest of this guide is how.

Where prompt tracking goes wrong

Most prompt-tracking setups make the same handful of mistakes. Each one is fixable, and the fixes are the six steps that follow.

One run, scored as in or out

A single answer is close to a coin flip. In my consistency study, asking for a city’s best hotels twice returned on average only one of the same top three. Score one run and you are recording a flip, not your visibility.

Feature-shopping prompts

Prompts reverse-engineered from your amenity list (“Paris hotel with a smart-trainer studio”) only ever flatter your own metric. Real guests ask from a need (“I’m a cyclist visiting Paris…”). Track the question, not the brochure.

One blended AI score

Each engine grounds hotel answers on different sources — review sites, Google Maps, its own retrieval. Averaging ChatGPT, Perplexity and Gemini into one number is like averaging your Google rank with your Bing rank. It hides the lever.

Tracking monthly

Engines refresh which sources they cite constantly. A monthly check is a fast-moving bank account looked at once a quarter: by the time you read it, the movement you caused is already gone.

Only measuring turn one

The opening question tells you if you are mentioned. It says nothing about surviving the follow-ups — alternatives, price, “is it actually good for X?” For a hotel, the booking journey is the unit of measurement, not the first reply.

Step 1

Build a persona × location panel

A panel is a fixed set of prompts you run forever, so movement means something. I build hotel panels from the two dimensions a real guest varies: who they are and how wide they cast the net.

Personas. Phrase each prompt the way a specific guest asks, with their need as the reason your hotel comes up — never your amenity list as keywords. “I’m a keen cyclist visiting Paris, any hotels set up for that?” is a persona prompt. “Paris hotel with a smart-trainer studio” is keyword-stuffing that only flatters your own metric. A workable controlled set: couples, solo leisure, families, solo business, luxury — plus the niche you are genuinely the best answer for (a cyclist, a dog owner, a wellness traveller).

Locations. Guests zoom in and out. Track a nearzone (the walkable neighbourhood — “Le Marais,” “the Gothic Quarter”) and a wide zone (“central Paris”). Your hotel is a strong match for both, against very different competitive sets — and, as the consistency data shows, the narrower one is far easier to win and to measure.

Cross the two — a handful of prompts per persona × location bucket — and you land around 40–60 discovery prompts that read like real planning questions. Add a small branded tier (“Tell me about Hotel X”) purely as a sanity and data-quality check. Then freeze the set and version it: only add prompts in batches, never edit an old one, or you lose week-over-week comparability.

Four intent tiers, hardest last: branded (does the engine even know you exist and cite your real site?), persona (the winnable core, where website changes show up first), neighbourhood(real local demand), and generic (“best boutique hotel in Paris” — expect near-zero for months; that is the climb, not a bug).
Step 2

Repeat every prompt, and keep the engines apart

This is the single change that turns noise into signal. Run every prompt several times — five is a sensible floor — per engine, every week, and keep ChatGPT, Perplexity, Gemini and Google AI as separate lines. Repeats average out the run-to-run wobble; weekly matches how fast the engines churn their sources; separate lines stop one strong engine from hiding four weak ones.

Two settings move the answer as much as the wording does, so fix them and write them down:

  • Country / IP. A French IP and a Japanese IP get different hotel answers. Decide whose view you are measuring — your home market plus your top one or two origin markets — and route from those countries deliberately, rather than letting a tool pick at random.
  • Clean, logged-out sessions. Personalised history skews the result. Measure the cold-start answer a new guest sees, and treat it as a floor, not the whole truth.

By hand this does not scale past one hotel. At volume you trigger the prompts through a scraping layer that captures the real chatgpt.com answer surface from a chosen country — which is exactly what my own pipeline does behind the scenes.

Step 3

Report rates, not booleans

Once each prompt has run several times, stop reporting yes/no. Report a rate: appeared in 3 of 5 runs is a 60% mention rate. With only five runs the margin is wide, so the rule is simple — do not trust a single week’s number, trust the trend across three or more weeks. One green week is weather; three climbing weeks is a result.

Record five things on every run, not just “were we there”:

  • Mentioned — your name appears in the answer text.
  • Cited — your own website is linked (and at what position).
  • Sentiment — are you the recommendation or the warning?
  • Attributes — the words attached to you: “quiet,” “great location,” “pricey,” “dog-friendly.”
  • Co-cited domains — every other site the answer leaned on (more on that below).
Bare mention-counting will lie to you. Being named first for “which hotels near Aligre are noisy?” trips the same counter as winning “best boutique hotel in the 12th.” Without sentiment, a loss records as a win.
Step 4

When you score zero, prove the zero is real

A hotel that never appears looks invisible. But a zero can be an artefact — a weak prompt, run-to-run variance, or the wrong country — rather than genuine absence. Before you accept “the AI has never heard of us,” climb a short ladder. This is the part I find hoteliers skip, and it has saved me from the wrong verdict more than once.

L0
Is the prompt even healthy?

Did that prompt return any real hotels and trigger a web search at all? If it returns junk for everyone, the prompt is broken — not your hotel invisible.

L1
Frequency

Re-run the prompt ten times. Appear in even one? Then you are not invisible, just intermittent — a variance problem, not an absence.

L2
More prompts

Widen the persona and phrasing set for that city. A single phrasing can simply miss you.

L3
Branded

Ask directly: “Is Hotel X any good?”, “Tell me about Hotel X.” If you only surface when named, the engine knows you but does not recommend you — a very different problem to fix.

L4
Home-country proxy

Re-run from your own country’s IP instead of a default US one. Local guests may see you when a foreign default does not.

You are truly invisible only if you stay at zero all the way up the ladder. Otherwise, record the level that surfaced you — “invisible to default prompts, appears only when named from a French IP” is a precise, fixable diagnosis, not a shrug.

Step 5

Measure the booking journey, not just turn one

Guests do not ask one question and book. They have a conversation, and AI is built for exactly that. A flat prompt list only ever measures the opening line. Take your highest-intent prompts and run them as a short journey, as one conversation, scoring every turn:

Problem
“I’m planning a few days in Paris — how should I choose where to stay?”
Exploration
“What are good boutique hotels near Bastille?”
Comparison
“How does Hotel Ranque compare to others for a cyclist?”
Validation
“Is Hotel Ranque actually worth it?”
Selection
“How do I book Hotel Ranque, and what’s the best rate?”

The metric a one-shot tracker can never see is persistence: are you still in the answer at Selection if you were there at Exploration? Surfacing once near the top and vanishing by the booking turn is a very different outcome from carrying all the way through — and only the journey view tells them apart.

Step 6

Track which sources each engine trusts

This is where measurement turns into a to-do list. Record the full set of domains every answer cites, and a pattern appears fast: each engine grounds hotel answers on different places. One leans on review and comparison sites, another on your Google Business Profile and Maps, another on its own retrieval of your site and blog.

That co-citation list is the most useful by-product of the whole exercise. It is a leaderboard of who gets cited instead of you — the OTAs, TripAdvisor, the listicles — and it tells you which lever to pull per engine: review velocity and comparison content to win one, schema and Google Business Profile to win another, your own structured pages to win a third. A flat “you rank 6th” never tells you that.

The same per-engine split runs through all my hotel research: the engines do not read your website so much as read about you. Measuring their sources is how you find the page you do not control that is quietly speaking for you.

A version you can run this week

You do not need a pipeline to start. You need discipline and a spreadsheet.

  1. 1
    Write 12 prompts

    Two branded, six persona (the guests you genuinely serve), two neighbourhood, two generic. Freeze them.

  2. 2
    Pick your engines

    ChatGPT and one of Perplexity or Gemini to start. Separate columns, always.

  3. 3
    Run each 5×, logged out

    Same day each week. Note the country if you switch it. Yes, it is tedious; that is the point.

  4. 4
    Score five fields

    Mentioned, cited, position, sentiment, co-cited domains — per run.

  5. 5
    Report rates, not booleans

    Mention rate per prompt per engine. Watch the trend over three weeks before you believe anything.

  6. 6
    Change one thing, then watch

    Adjust one item on your site, log the date, and see if the line moves over the next runs. Never claim cause — show timing.

Want the engines, the fixes, and the data first?

This guide is the measurement layer. For how AI actually picks hotels and what to change, start with the complete guide.

AI Search for Hotels: the complete guide

FAQ

Yes — once you stop measuring it like a rank tracker. A single run is close to noise: in my consistency study, asking for a city’s best hotels twice returned on average only about one of the same top three. But that variance is structured, not random. Fix the city, the persona and the query, repeat the prompt, and a stable signal appears — position-1 stability ran from 17% in the most competitive markets to 96% in the most constrained.

Further reading

This guide is the measurement layer of a larger body of hotel AI-search work. The variance figures above come from my own rankings-consistency study; the source-grounding split runs through the whole research library.

Summarize with AI

ChatGPTPerplexityClaudeGeminiGrok