# nicolassitter.com — Full Content Index > Data-driven experiments on AI search, the web, and digital platforms by Nicolas Sitter ## Author Nicolas Sitter is a tech enthusiast running data-driven experiments on AI search, the web, and digital platforms. Research is organised by topic — currently spanning hotel-industry studies (the largest body of work to date) and cross-industry AI-search methodology, with more topics expanding over time. ## Guides ### AI Search for Hotels — The Complete Guide - URL: https://nicolassitter.com/guide/ai-search-for-hotels - Date: June 2026 - Summary: A practical, data-backed GEO (Generative Engine Optimization) guide for hotels, synthesising 30+ of the studies below into a four-step framework: Diagnose → Owned Media → Earned Media → Measure. - Core thesis: Most AI engines don't read hotel websites — they ground hotel answers on structured place and review data (overwhelmingly Google Maps/Places, ~89% of entity cards, plus TripAdvisor, Yelp and OTAs). So hotel GEO is mostly about controlling those sources, your Google Business Profile, and your reviews — not page copy. Schema markup helps indirectly because it feeds the systems the LLM calls (Knowledge Graph, Places), not the LLM itself. - Key actions: allow AI crawlers (only 3.3% of hotels block any); complete and correct the Google Business Profile; add consistent Hotel/LocalBusiness + FAQPage schema (36.3% of hotels have none); keep one canonical name everywhere; earn citations in the sources AI quotes (TripAdvisor, Yelp, OTAs, listicles, YouTube, Reddit); measure via GA4 AI referrers, branded/direct-traffic growth, and a monthly prompt-panel citation rate. ### AI Visibility for Hotels — The Complete Optimization Guide - URL: https://nicolassitter.com/guide/ai-visibility-for-hotels - Date: June 2026 - Summary: Combines AEO (Answer Engine Optimization) and GEO into one resource: how AI models recommend hotels, the data sources they use, the 5 pillars of AI visibility, an 8-step optimization checklist, and how to measure results — built on 1.2M+ AI citations across 6 models and 25 cities. - Key actions: audit the Google Business Profile (79% of Google AI Mode hotel links go to GBP); enforce entity consistency (89% recognition when names match); manage reviews on TripAdvisor (cited in 86% of AI Mode hotel queries); build multi-source presence so Reciprocal Rank Fusion rewards you; add structured data; monitor mention share and respond to model updates. ### ChatGPT Hotel Optimization - URL: https://nicolassitter.com/guide/chatgpt-hotel-optimization - Date: June 2026 - Summary: A technical guide to ChatGPT's hotel search internals — 12 systems, 7 data providers, and the Sonic classifier — and what each implies for optimization. - Key thesis: ChatGPT grounds hotel answers on Google Places and review providers, fuses sources via RRF, and rewards multi-source consistency over on-page copy. ### Google AI Mode for Hotels - URL: https://nicolassitter.com/guide/google-ai-mode-hotels - Date: June 2026 - Summary: Where hotel clicks go in Google AI Mode — 79% to Google Business Profile, 3.6% to OTAs despite OTAs being ~46.6% of citations — and the optimization playbook that follows. - Key actions: GBP is the lever for AI Mode; OTAs and TripAdvisor influence mention, not destination. ### Schema Markup for Hotels - URL: https://nicolassitter.com/guide/schema-markup-hotels - Date: June 2026 - Summary: The definitive Schema.org guide for hotels — Hotel, LodgingBusiness, HotelRoom, Reviews, FAQ, and Offer types with full JSON-LD examples and AI-visibility impact. - Key thesis: Schema doesn't feed the LLM directly — it strengthens the entity record (Knowledge Graph, Places) the model ultimately grounds on. With ~36% of hotels carrying no schema, it's a cheap competitive edge. ### How to Prompt-Track Your Hotel's AI Visibility - URL: https://nicolassitter.com/guide/prompt-tracking-hotel-ai-visibility - Date: June 2026 - Summary: The measurement layer. AI hotel rankings look random but are structured by city, guest persona and phrasing — so they are measurable if you measure around the variance instead of using a rank tracker. Anchored in the rankings-consistency study: run-to-run, only ~1.1 of the top 3 hotels repeat; position-1 stability ranges 17% (competitive markets) to 96% (constrained queries). - Method (6 steps): (1) build a frozen persona × location prompt panel across 4 intent tiers (branded / persona / neighbourhood / generic); (2) run every prompt ~5× per engine weekly, engines tracked separately, country/IP fixed, logged-out; (3) report mention/citation rates with a margin of error, plus sentiment and attributes, not booleans; (4) when you score zero, climb the is-the-zero-real ladder — L0 prompt health, L1 run 10×, L2 more prompts, L3 branded, L4 home-country proxy — and record the level that surfaced you; (5) measure the booking journey (Problem → Exploration → Comparison → Validation → Selection) as one conversation and track persistence; (6) log every answer's co-cited domains to see which sources each engine trusts and which lever to pull. ## Published Experiments ### 1. The AI Hotel Landscape 2026 - URL: https://nicolassitter.com/research/ai-hotel-landscape-2026 - Date: January 2026 - Summary: The most comprehensive study of AI hotel recommendations. Analysis of 1.2M+ citations across ChatGPT, Gemini, Perplexity, Claude, Copilot, and Grok. Covers 12,500+ prompts across 25 cities. - Key findings: OTAs dominate AI citations, Booking.com leads, independent hotels struggle for visibility. ### 2. Google AI Mode: Where Do Hotel Clicks Go? - URL: https://nicolassitter.com/research/google-ai-mode-hotel-study-2026 - Date: February 2026 - Summary: Analysis of 4,000 hotel queries in Google AI Mode. 79% of hotel clicks go to Google Business Profiles. 84K+ references analyzed across 1,146 hotels. - Key findings: GBP dominates click distribution, hotel websites get minimal direct traffic from AI Mode. ### 3. Do French Hotels Blog? A 15,000-Hotel Study - URL: https://nicolassitter.com/research/french-hotel-blog-study-2026 - Date: January 2026 - Summary: Study of 15,155 French hotel websites measuring blog adoption. 49.3% have blogs, but only 1 in 4 are actively maintained. - Key findings: Blog presence correlates with higher AI visibility, but most hotel blogs are dormant. ### 4. Anatomy of a ChatGPT Hotel Search - URL: https://nicolassitter.com/research/anatomy-chatgpt-hotel-search-2026 - Date: March 2026 - Summary: Technical teardown of how ChatGPT builds hotel recommendations. 12 systems, 7 data providers, 424 A/B tests analyzed. - Key findings: ChatGPT uses a complex multi-provider pipeline including Yelp, TripAdvisor, and Google Places. ### 5. How Consistent Are AI Hotel Rankings? - URL: https://nicolassitter.com/research/ai-hotel-rankings-consistency-study-2026 - Date: February 2026 - Summary: Replicating SparkToro's consistency methodology for hotels. Only 50.5% position stability across reruns of 4,000 queries. - Key findings: AI hotel rankings are volatile — the same query produces different results each time. ### 6. Hotel Schema.org Adoption Study - URL: https://nicolassitter.com/research/hotel-schema-adoption-study-2026 - Date: March 2026 - Summary: Scanned 121,425 hotel websites across 7 countries for structured data. 36.3% have no schema at all, 41% use the wrong type. - Key findings: Most hotels are invisible to AI due to missing or incorrect structured data. ### 7. Yelp in ChatGPT: Hotel Data Study - URL: https://nicolassitter.com/research/yelp-chatgpt-hotels-study-2026 - Date: February 2026 - Summary: How Yelp is integrated into ChatGPT hotel queries. 14 destinations analyzed, 33% Yelp integration rate in US markets. - Key findings: Yelp integration is US-focused, European hotels rarely appear via Yelp in ChatGPT. ### 8. What Hotels Are Actually Called: A Naming Study - URL: https://nicolassitter.com/research/hotel-naming-study-2026 - Date: March 2026 - Summary: Analysis of naming conventions across 121,425 hotels in 7 countries. 8 analysis angles including word frequency, length, and star-rating patterns. - Key findings: "Hotel" is the most common word, luxury properties use longer names, regional naming patterns vary significantly. ### 9. Hotel robots.txt & AI Blocking Study 2026 - URL: https://nicolassitter.com/research/hotel-robots-ai-blocking-study-2026 - Date: March 2026 - Summary: 105,002 hotel robots.txt files parsed across 7 countries. Only 3.3% block any AI crawler. GPTBot most blocked at 2.9%. France leads at 7.5%. - Key findings: Hotels overwhelmingly allow AI crawlers. 2.1% use the "smart strategy" of blocking training bots while allowing search bots. ### 10. Hotel llms.txt Adoption Study 2026 - URL: https://nicolassitter.com/research/hotel-llms-txt-adoption-study-2026 - Date: March 2026 (updated 2026-05-09) - Summary: 105,002 hotel websites scanned for llms.txt files. Only 6.3% have one. US leads at 12.4%, France trails at 3.8%. WordPress SEO plugins drive 33% of adoption. - Key findings: llms.txt adoption is very early-stage. Higher-star hotels adopt at 2.5x the rate of 1-star. 7.3% misuse the file for access control rules. - Update (May 2026): Shopify now ships llms.txt, llms-full.txt, and agents.md by default on every store, exposed via /sitemap_agentic_discovery.xml. Example: https://respire.co/sitemap_agentic_discovery.xml. Not yet reflected in adoption numbers above (March crawl). ### 11. What Hotel Footers Reveal — 98K Study - URL: https://nicolassitter.com/research/hotel-footer-analysis-study-2026 - Date: March 2026 - Summary: 98,423 hotel footers parsed across 7 countries. Instagram is in 40.8% of footers, overtaking Facebook in the US and UK. 23.8% of copyright years are 3+ years stale. - Key findings: 9.9% of hotels link to OTAs from their own site. TikTok adoption at 11.8% in the US. Only 54.9% link to a privacy policy. ### 12. ChatGPT Hotel Data Sources: 100K Entity Study - URL: https://nicolassitter.com/research/tripadvisor-chatgpt-hotels-study-2026 - Date: March 2026 - Summary: 99,538 ChatGPT map entities tracked over 90 days. Google Places fell from 100% to 70.3% as Yelp (10.2%), TripAdvisor (0.1%), and Foursquare (0.2%) entered. - Key findings: TripAdvisor descriptions are 8.8x longer than Google. Yelp is US-city dependent. TripAdvisor links to its own pages, not hotels directly. ### 13. How Dirty Is Google Maps Hotel Data? 179K Study - URL: https://nicolassitter.com/research/google-maps-hotel-data-quality-2026 - Date: April 2026 - Summary: 178,647 Google Maps hotel listings across 11 countries analyzed. 17% fail basic quality checks. 8,167 are OYO vacation rentals listed as hotels. - Key findings: Belgium loses 54% of listings after cleaning. ChatGPT uses Google Maps as its primary data source (88.8% of map entities). Chain hotels average 3x more reviews than independents. ### 14. ChatGPT Hotel Index vs Live Web — What Changes When Search Goes Offline - URL: https://nicolassitter.com/research/chatgpt-hotel-index-vs-live-web-2026 - Date: April 2026 - Summary: We ran 400 hotel queries on GPT-5.4 and GPT-5.3 in live and cached mode. 83% of cited domains differ. The index and the live web recommend different hotels. - Key findings: OpenAI maintains its own search index alongside live web access. GPT-5.4 runs ~2 searches per response with advanced operators. Free-tier users likely never get live web results. ### 15. ChatGPT Hotel Ads Are Live — CPC Pivot, Ads Manager, $50K Entry - URL: https://nicolassitter.com/research/chatgpt-hotel-ads-live-2026 - Date: April 2026 (updated April 29, 2026) - Summary: Sponsored ads now appear in 20-35% of ChatGPT hotel queries for US users. Booking.com dominates at 43.5%. April 29 update: OpenAI quietly launched a self-serve ads manager, pivoted from CPM to CPC, and dropped the entry to $50K. Reportedly ~$100M annualised revenue six weeks into the test. - Key findings: Booking.com owns 43.5% of ad slots, followed by Airbnb (21.2%) and Expedia (17.6%). OTAs collectively control 87.7%. Ads first appeared March 31, 2026. CPC pivot resolves the early "scraper impression" problem: bots don't click, so they don't cost. ### 16. Hotel YouTube Channels — Activity Study 2026 - URL: https://nicolassitter.com/research/youtube-hotel-visibility-2026 - Date: April 2026 - Summary: We analyzed 2,583 YouTube channels linked from 98,423 hotel websites across 7 countries. Only 10% of hotels link to YouTube. 43.7% of those channels are ghost accounts (2+ years silent). Only 11.3% post monthly. Median views: 412. - Key findings: US hotels are 4x more active than Italian hotels on YouTube. 5-star hotels are 2.3x more likely to be active than 3-star. Medium-form video (1-10 min) dominates at 54.6%. 68% of channels have fewer than 100 subscribers. ### 17. ChatGPT 5.3 Halved Its Hotel Sources — March 5, 2026 Cutover - URL: https://nicolassitter.com/research/chatgpt-hotel-source-shift-2026 - Date: April 2026 - Summary: Daily ChatGPT UI runs of 140 world hotel prompts from 4 country locales (US, GB, DE, ES) across 87 days. On 2026-03-05 ChatGPT UI switched to GPT-5.3 (5.4 in API). URLs per answer dropped 49% (24→12), unique domains per answer 46% (18→10), domain pool per prompt 54% (132→61), inline-cite rate collapsed 100%→24%. - Key findings: Drop is uniform across locales (−47 to −53%) — a model/pipeline change, not a geography rollout. Losers: Booking −82%, Expedia −76%, Hotels.com −77%, TripAdvisor −69%, Reddit −93%, Wikipedia −93%, Four Seasons −94%. Gainers: a Stockholm-operated network of 17 Booking-affiliate SEO listicle sites (luxuryhotel.guide, all-boutique-hotels.com, hotels-with-balcony.com, couples-hotels.com, etc.) sharing Gandi registrar, Cloudflare NS pairs ANDY/RITA and BRENDA/GRAHAM, and image host images.luxuryhotel.guru. Ted Valentin named as curator on one site — Stockholm directory-site entrepreneur; other curators (Maja Holm, Elain Olsson, David Bachmann) are likely personas. Luxury queries hit hardest (−54%), affordable least (−41%). St Barts worst city (−64%). ### 18. How Claude Searches Hotels - URL: https://nicolassitter.com/research/how-claude-searches-hotels-2026 - Date: May 2026 (updated 2026-05-04) - Summary: Captured event streams of several Claude conversations about hotel discovery, across budget tiers, languages, brand filters, and date-anchored stays. The pipeline has two branches gated by the Connector Discovery setting (off by default). With Discovery off, almost everything goes through one direct Google Places call (places_search) plus a thinking-step re-rank by rating × review count and a map render. With Discovery on, Claude returns a small curated panel of OTA connectors (Booking.com, Tripadvisor, Trivago and a few others) plus a Browse-all link. Anthropic has stated it will not run sponsored ads, so the curation logic itself is the product. - Key findings (default mode): 8 of 9 captures called places_search; the only outlier used web_search. Number of parallel search calls scaled with prompt fuzziness — sharded by price tier, neighbourhood, synonym, or brand. Two captures used a second round to verify named hotels pulled from training. A French prompt was searched in English but answered in French. Things Places has no field for (pet-friendly, sauna, brand affiliation, star tier) were filled in from review snippets or Claude's own training knowledge. Dates appeared in map titles and closing offers but never in the search call. - Key findings (connector mode): asking Claude to book with Discovery off produces an inline opt-in prompt; turning Discovery on returns a curated picker. The visible-list problem (3-4 slots, growing supply of chain loyalty programmes and direct-booking platforms wanting in) is the open question — without an auction mechanism, the selection logic is closer to App Store editorial than to Google Ads. ### 19. How Mistral Searches Hotels - URL: https://nicolassitter.com/research/how-mistral-searches-hotels-2026 - Date: May 2026 - Summary: Captured Le Chat event streams across French, Italian and English prompts; specialist niches and a generic city-tier query; named-hotel sentiment, brand-vs-brand comparison, and accessibility-sensitive intent. The pipeline is the simplest of any AI we've captured for hotels: one web_search call (Brave-backed, toolType "rag") per entity, snippet paraphrase, inline references. No place data, no map, no booking surface. - Key pipeline findings: One Brave call by default; parallelised only when the prompt names multiple distinct entities to compare (Mama Shelter vs 25hours fired 2 parallel calls 135ms apart, wall-clock latency stayed ~1.3s). Mistral does not follow Brave rank order strictly — there's an LLM-side selection layer over the SERP. Reference grouping is sophisticated: per-entity for list answers, per-argument for two-sided summaries, with the same source cited twice when it carries opposing claims. - Query-rewrite rule (per-term): Globally-indexable English terms get translated ("un hôtel boutique" → "boutique hotel"); named entities stay (Saint Pierre, Gare de Lyon, Toscana); structural connectors keep the user's language ("hors quartier"). Response language always matches the prompt. - Year-injection: the vast majority of our captures had "2026" appended to the Brave query — often unprompted. Relative-time words ("recent", "ce week-end") get resolved to specific dates / current year. The lone exception we saw was a hard-negation prompt where the constraint was already pruning the result set. - Niche vs generic SERP shape: Niche specialist queries (cyclists, family-Tuscany, vegan-Lisbon, train-station-Paris) surface real specialist editorial sites (freewheelingfrance, its4kids, veganfamilyadventures, igares, hotelaparis). Generic queries ("best 3-star Marseille") surface programmatic SEO-spam aggregator networks (3-star-hotels.com, marseillehotel24.com, marseillefrhotels.com). Safety-sensitive queries override both with official institutional sources (visitberlin.de for accessibility). - Authority-laundering (four patterns): One review → "many guests"/"significant number"; self-marketing copy → "known for"; editorial coy-label → presented as brand name; outright fabrication of a non-existent property (Mama Shelter has no Vienna location, Mistral described "Mama Shelter Vienna" anyway using brand-homepage and Prague review snippets). - Intent gate on closing offer: Imminent + dated transactional → hard OTA punt (Booking/TripAdvisor/Hotels.com mention); brand-vs-brand → soft drill-in on prices/amenities; generic best-of, subjective sentiment, niche aspirational → refinement question only; safety-sensitive → refinement plus explicit recommendation to an external authority site. Le Chat has no booking connector or app picker. - Same-prompt head-to-head with Claude on "best 3-star hotels in Marseille": one overlap (Alex Hotel & Spa). Claude returns boutique-leaning Google Places winners ranked by rating × review count; Mistral returns chain and apartment-hotel-leaning paraphrases of SEO-spam aggregator pages. ### 20. The Schema.org Debate (2026): Why It Still Matters for Hotels - URL: https://nicolassitter.com/research/schema-org-grounding-loop-2026 - Date: May 2026 - Summary: AEO oversold schema as an LLM unlock. The pushback is right for the general case — transformers don't read JSON-LD as JSON-LD, they read tokens. But for local / hotel intent, every major AI grounds against Places / KG / OTA aggregator surfaces, all of which sit downstream of schema. So: schema → entity graph → grounding source → LLM. Same effect, different mechanism than the AEO pitch implies. - The four schema fields that move the needle for hotels: @type: Hotel (category gate — not LocalBusiness or Organization); sameAs (cross-platform identity, reconciles your hotel to its Booking / TripAdvisor / Wikidata / Maps profiles); typed starRating with Rating object (the category bucket gate — AI queries are sliced by "best 4-star", "luxury", "budget"); alternateName (cross-time identity for rebranded properties — Hôtel de la Paix → Alfred Hotel Beaune). - GBP framing: a Places API response in an LLM tool call exposes a narrow set of fields (rating, review_count, hours, photos, address) coming from Google Business Profile. That's what is visible to the model at call time and makes schema look redundant. But GBP has no amenities field, no room categories, limited typed attributes. Schema fills that gap and feeds the fuller hotel entity behind the KG. - HotelRoom / containsPlace / BedDetails / floorSize: under your control via schema, encode what GBP cannot. One-time markup cost, zero downside, natural place to expose room-level structure to any grounding source that consumes it. - FAQ schema: fine if relevant on the page, but not the AI-visibility lever. The boring four above do that work. - Where AIs source hotel data: Gemini → Google Maps / Places / KG natively; ChatGPT → 5+ providers including Places with rank fusion; Claude → one direct places_search call per query. Three stacks, all share one thing: their grounding sources sit downstream of the entity graph schema feeds into. ### 21. The ChatGPT Direct-Traffic Explosion for Hotels (May 2026) - URL: https://nicolassitter.com/research/chatgpt-hotel-direct-traffic-explosion-2026 - Date: May 2026 - Summary: On May 7, 2026, ChatGPT changed how it terminates hotel recommendations. Brand names became inline links pointing to the hotel's own homepage instead of dead-ending in citation chips. Across The Hotels Network's anonymised panel of more than 17,000 hotels, daily AI sessions jumped from a 31,688/day baseline (May 1–6) to 51,282/day (May 7–25) — a +62% step change that held through May 25 (a brief late-May dip, then recovery to 58K) without returning to baseline. - Key panel numbers: ChatGPT sessions in May ~852K (+823% vs Jan). AI % of all hotel-website sessions 0.64% → 0.92% (+44% relative). The number of hotels seeing AI traffic on a given day climbed from ~6,300 to a peak of 7,532. - Who actually gets the traffic (per-hotel distribution, May 7–25, ~10,000 hotels with measurable share): the ~0.9% network average hides a heavy skew. 3,730 hotels draw ≥1% of new sessions from AI, 1,613 ≥2%, 286 ≥5%, and 43 ≥10% — for that tail, AI is already a top acquisition channel. Demographic cuts (chain vs independent, star, geo) are pending a dimension join. - It's a ChatGPT-only story: Inside the AI mix, ChatGPT share went from 98.1% → 98.8% (+0.7pp). Perplexity 1.1% → 0.7% (−0.4pp). Claude 0.7% → 0.5% (−0.2pp). Perplexity weekly absolute sessions were essentially flat across Mar 16–May 11 (2,978 → 2,516). Claude inched up but on tiny base (1,554 → 1,799). Mistral / Copilot / Gemini / Grok are statistical noise in the panel. - Volume not mix — the surprising hotel-specific finding: Hotels were ALREADY homepage-heavy on AI traffic before May 7. Homepage share of AI sessions: 69.09% → 69.84% — essentially unchanged. The May 7 change drove volume, not page-mix. This is different from SaaS / B2B where homepage share reportedly grew from ~4% to ~24% on May 7. Hotel websites are built around a homepage that functions as the booking widget gateway; AI assistants were already routing brand-name queries there. - Two plausible reasons OpenAI may have shipped this: (1) CTR signal for the new CPC ads stack — embedding brand URLs inline and watching which get clicked produces exactly the training data a CPC ranker needs; (2) peace offering to the open hotel web — turns ChatGPT from a pure referral sink into a measurable referral source, helps with publisher and regulator relationships. - For hotels: AI mentions are now monetisable as direct traffic. The homepage that's already optimised for conversion just got 62% more cold-acquisition sessions a day. Hotels need chatgpt.com / openai.com tagged as a named GA4 channel group with booking-engine conversion wire-up. The companion measurement framework is at /research/how-to-measure-ai-hotel-traffic-2026. - Disclosure: Hotelrank (the hotel AI-visibility product behind this research) has been acquired by Lighthouse (The Hotels Network), and the author now works there. The panel data comes from The Hotels Network. - What's pending: segment cuts by chain/independent, star rating, geography — needs a separate join against THN's hotel-attribute dimensions. Booking-side revenue impact also pending (panel exposes referrers, not bookings). ### 22. AI Search for Yoga Studios in Paris (2026) - URL: https://nicolassitter.com/research/yoga-studios-paris-ai-search-2026 - Date: May 2026 - Topic: AI Search Studies (first non-hotel vertical) — applies the AI-hotel-search methodology to Paris yoga studios. - Summary: 27 prompt templates × 2 languages (EN/FR) × 2 proxy countries (US/FR) × 5 AI engines (ChatGPT, Perplexity, Gemini, Copilot, Google AI Mode), captured 2026-05-23 → 2026-05-24. Studios named in each answer were extracted with a named-entity-recognition pipeline (span detection → normalisation → fuzzy entity resolution → chain aggregation) and resolved to a registry of 369 verified studios. Grok excluded (crawler timeout); AI Mode × FR proxy failed. - Methodology note: the 369-studio registry is the universe answers resolve TO, not a list fed to the models. Per-engine leaderboard reported as a presence rate (% of that engine's prompt-captures where the studio appears) since engines answered different prompt counts (ChatGPT 114, Gemini/Copilot 108, Perplexity 100, AI Mode 54). Modo: 45.6% ChatGPT, 43.5% Copilot, 0% Gemini and AI Mode. - OTA layer = discovery marketplaces (ClassPass, Gymlib) — the true hotel-OTA analog, distinct from booking engines (Mindbody, bsport, Eversports) which are the studio's own software/direct layer. ClassPass is the single most-cited domain (classpass.com 146; 187 across all URLs); Gymlib 14 (all ChatGPT). Marketplace cites per engine: Perplexity 83 (hardest), ChatGPT 54, AI Mode 37, Gemini 29, Copilot 0 (bypasses entirely). Booking engines total just 42 cites — AI rarely surfaces them. - Entity-fragmentation finding: Gérard Arnaud Yoga (well-known 500h teacher-training school) IS recommended by all five engines — named in 30 visible-answer captures (Gemini 11, Copilot 7, Perplexity 6, AI Mode 5, ChatGPT 1), mostly for inversions / teacher-training / advanced vinyasa — yet absent from the top 12. On Google Maps the school is split across its two rooms under street names ("Studio Rauch" 3 Passage Rauch; "Salle Amelot" 11 Passage Saint-Pierre Amelot, both 75011), so instead of one strong studio it resolves as a couple of thinner ones and its citation footprint never consolidates — aggregated ~41, short of the 49 top-12 cutoff. Gemini named it 11x in prose but cited it 0x, so a citation-based score counts none of that. Lesson: a brand split across a teacher name and street-named map records is harder for entity-first engines to ground; a consistent name + alternateName tying rooms↔brand helps. The local-search echo of the hotel naming problem — not real invisibility. - Headline finding — three distinct source strategies: Copilot is 96% studio-website citations (pure entity-resolution lookup); Perplexity 52% and Gemini 41% also studio-direct; ChatGPT is social/editorial (16% Reddit, 13% editorial, only 32% studio websites — Reddit is its single #1 source at 17% of cited URLs); Google AI Mode cites google.com back to itself 52% (self-referential). There is no single "AI search" to optimise for. - Studio leaderboard (chain-aggregated, all engines): #1 Yay Yoga Studio (181), #2 The Space Paris (146), #3 Ashtanga Yoga Paris (136), #4 Modo Yoga Paris (135), #5 Jivamukti Yoga Paris (131). - Platform blindspot: Modo Yoga Paris ranks #4 overall and is #1 in ChatGPT's map widget, yet scores zero citations on Gemini and Google AI Mode — surfaced by only 3 of 5 engines. Platform-specific invisibility, not a quality signal. - Geography: ChatGPT is 100% accurate for numbered arrondissements (6e, 10e, 11e, 16e) but loose on named neighborhoods — Montmartre 76%, Le Marais just 29% (confused with adjacent 3e/11e/12e). The 11e is the city's yoga capital (38 studios in the seed). - Language & TLD bias: EN vs FR prompts return mostly different top-5 studios (17 of 27 templates < 25% overlap). ChatGPT treats TLD as a language-affinity signal — .com studios cited 1.7× more by English prompts, .fr studios perfectly balanced. Proxy country also reshapes the mix (.com aggregators skew US, .fr local sources skew FR). - Entity understanding: clean style → specialist mapping (Ashtanga → Ashtanga Yoga Paris; Vinyasa → Modo; Yin → YUJ/Casa). Low yoga↔pilates bleed — of 23 pilates-answer studios, only 2 appear in any yoga prompt. 96% of ChatGPT yoga captures triggered live web search. - What generalises from hotels: per-engine source strategies, platform-specific invisibility, language/proxy sensitivity, near-universal web-search triggering. Vertical-specific: no dominant OTA layer (studio websites are the centre of gravity, unlike Booking/Expedia for hotels), Reddit matters much more for yoga, and yoga's style-as-entity mapping has no clean hotel equivalent. - Disclosure: the author practices at Modo Yoga Paris (#4); it received no special handling in the analysis. ### 23. AI Search for Bike Shops in Amsterdam (2026) - URL: https://nicolassitter.com/research/bike-shops-amsterdam-ai-search-2026 - Date: May 2026 - Topic: AI Search Studies — second city/vertical extension of the AI-search methodology. - Summary: 27 prompts × 5 AI engines × EN/NL × US/NL proxies, captured 2026-05-23 → 2026-05-24. 378 captures (540 matrix, dropped to 378 after failed/blocked/empty runs), 3,010 citations against the 372-shop Apify seed (228 with websites). All 5 platforms produced usable data. - Source-strategy headline (replicates the Paris yoga pattern, sharper here): Copilot 97% shop websites (entity engine), Gemini 72%, Perplexity 71%; ChatGPT 34% social (Reddit-led — reddit.com is the most-cited external domain in the study at 198 cites across 4 platforms); Google AI Mode 82% google.com (the highest self-referential share measured anywhere). - The cleanest entity consensus in the study: 12 brands hit all 5 platforms (Ride Out Amsterdam, Het Zwarte Fietsenplan, Wheelrunner, Kaptein Tweewielers, Amsterdamse Fietswinkel, Echelon Cycle Sport, Damskø, Gregario Cycling Services, BikeFlip, DrBeyk Online, FietsJeroen, Trompton Amsterdam). No Modo-style blind spot. - Language mechanism: Dutch vs English prompts return 0% overlap on repair, commuter, road and Dutch-bike queries; brand/tourist queries converge. Perplexity exposed its internal search (44 captures) — fanout_count = 1 in every one, and it PRESERVES the prompt language (lightly normalizes: drops "in", pluralizes). So a Dutch prompt produces a Dutch search and matches Dutch sources; the divergence is "a different search entirely," not a ranking difference. - TLD bias: NONE — .nl cited equally by EN/NL prompts (0.98×). Dutch and English users see the same domains; only the which-shops set differs. - Geography: district-targeting returned 0% across Centrum/De Pijp/Jordaan/Noord/Oost — a SEED-GRANULARITY ARTIFACT (Apify labels city = "Amsterdam" with no neighborhood field), null test, NOT "AI gets districts wrong." - ChatGPT triggered web search on 108/108 captures (100%); other platforms don't expose the trigger flag through Bright Data, so their search behavior is unmeasured, not absent. ### 24. AI Search for Bookstores in Tokyo (2026) - URL: https://nicolassitter.com/research/bookstores-tokyo-ai-search-2026 - Date: May 2026 - Topic: AI Search Studies — third city/vertical, first non-Western field test. - Summary: ~22 prompts × EN/JA × US/JP proxies, captured 2026-05-23 → 2026-05-24. 336 captures, 2,322 citations against 584 Tokyo bookstores (1,310 Apify seed, 23 Special Wards + Musashino/Mitaka). Only 4 platforms had usable data — Perplexity returned no usable Tokyo coverage in this run (flagged as a genuine data gap, not papered over). - Headline finding — Tokyo runs on a dense local-guide web AI Mode and Western verticals don't show: after a Tokyo-specific taxonomy pass, "other" dropped from 36% → 15% overall and revealed FOUR strategies, not three: 1. Entity engine (Copilot): 89% direct-to-store-domain. 2. Self-referential (AI Mode): 61% google.com. 3. Social engine (ChatGPT): 21% social (Reddit-led), with a long editorial tail. 4. Local-guide engine (Gemini) — the Tokyo signature: 5% store sites, editorial_local 39% + expat_media 19% = 58% third-party guides (whenin.tokyo, gltjp.com, trulytokyo, gaijinpot, japan-guide, visit-chiyoda, Tokyo Weekender). For Tokyo bookstores Gemini is effectively a travel-guide aggregator, not an entity engine. This guide layer has no Western equivalent at the same density. - Leaderboard: DAIKANYAMA T-SITE (Tsutaya design flagship) and Kinokuniya tie at the top with 122 each. Below them, English-friendly stores (Aoyama Book Center, Infinity Books Japan) and the Jimbocho used-book cluster dominate the breadth tier. - TLD bias — STRONGEST measured: .jp store domains cited 5× more in Japanese prompts than English (Amsterdam was neutral at ~1×, Paris .fr was 1×, Berlin .de 1.5×). Ask ChatGPT in Japanese and it leans hard into native .jp; ask in English and those domains nearly vanish. - Language delta: mixed — some neighborhood queries converge (Shimokitazawa 100%) but genre and "iconic" diverge sharply (kids 0%, iconic 17%). EN surfaces English-language and tourist-famous stores; JA surfaces a different local set. - Geography: ChatGPT respects ward-level geography (Shibuya 100%, Shinjuku 79–92%). 0% on Jimbocho/Aoyama/Daikanyama/Shimokitazawa is a LABELING CAVEAT, not AI error: these are neighborhoods inside wards (Chiyoda, Minato, Shibuya/Setagaya) and the seed stores the official ward, so a correct result fails the string match. Neighborhood precision is real but unmeasurable against ward-labeled seed data. - ChatGPT triggered web search on 95/96 captures (99%). Fan-out NOT captured for any Tokyo platform this run — flagged as a genuine data gap. ### 25. AI Search for Yoga Studios in Berlin (2026) - URL: https://nicolassitter.com/research/yoga-studios-berlin-ai-search-2026 - Date: May 2026 - Topic: AI Search Studies — direct REPLICATION of the Paris yoga study to test whether the engine personalities are structural traits or city artifacts. - Summary: 27 prompts × 5 AI engines × EN/DE × US/DE proxies, captured 2026-05-26 → 2026-05-27. 540 captures, 5,293 citations against 631 Berlin yoga studios (683 Apify seed). All 5 platforms returned on both US and DE proxies — AI Mode × DE worked here, unlike Paris's AI-Mode × FR failure. - HEADLINE — the three personalities replicate almost line-for-line, proving they're STRUCTURAL not city-specific: | Platform | Paris (studio %) | Berlin (studio %) | Verdict | | Copilot (entity engine) | 96% | 95% | near-identical | | ChatGPT (social engine) | 32% studio + 16% Reddit | 32% studio + 19% Reddit | near-identical | | AI Mode (self-referential) | 52% google.com | 59% google.com | replicates | ChatGPT's studio share is EXACTLY 32% in both cities and Reddit is again its top social source. - THE BERLIN TWIST — booking-platform dominance: Perplexity 31%, Gemini 27%, ChatGPT 16% of citations go to Urban Sports Club + Eversports + ClassPass. blog.urbansportsclub.com (213) and classpass.com (208) are the two most-cited domains in the entire Berlin study, ahead of every studio's own site. Berlin's fitness-subscription ecosystem (Urban Sports Club and Eversports are German/Austrian-born) has become a primary AI source — something Paris doesn't show. AI local-search source mix bends to the local commercial infrastructure. - Leaderboard: #1 Yogicescape (213), #2 Jivamukti Yoga School (151), #3 YOU GLOW Yoga & Womanhood Mitte (143), #4 SHA-LA Studios Prenzlauer Berg (125), #5 YCBA YogaCircle Berlin Akademie (121). Jivamukti is the only brand top-3 in BOTH Paris (#5) and Berlin (#2) — a genuinely cross-city-durable AI-visible chain. Berlin's top tier is consistent across all 5 engines (no Modo-style blind spot). - TLD bias — STRONGER local skew than Paris: .de 1.5× DE-biased, .berlin 5× (vs Paris where .fr was language-neutral at 1.0× and only .com skewed English). German prompts pull German-TLD studios harder than French prompts pulled French ones. - Language delta: identical pattern to Paris. Proper-noun queries (Ashtanga, Mysore) converge 100% across EN/DE; generic-intent queries (affordable, teacher training, morning, community) diverge completely. - Geography: district-targeting returned 0% across all six Bezirke — same SEED-GRANULARITY ARTIFACT as Amsterdam (the Apify seed labels city = "Berlin" with no neighborhood field). Null test, NOT "AI gets districts wrong." Paris worked because postal codes map cleanly to arrondissements; Berlin postcodes don't map 1:1 to Bezirke. ### 26. AI Search for Specialty Coffee in Marseille (2026) - URL: https://nicolassitter.com/research/specialty-coffee-marseille-ai-search-2026 - Date: June 2026 - Topic: AI Search Studies — fifth cross-vertical/city case (Marseille, specialty coffee). The headline is a genuine break-from-pattern, not another replication. - Summary: 23 prompt templates × 2 languages (EN/FR) × 2 proxy countries (US/FR) × 5 AI engines (ChatGPT, Perplexity, Gemini, Copilot, Google AI Mode), captured 2026-05-28. 413 captures (23×2×2×5=460 theoretical, minus AI Mode×FR rejected −46, minus one empty ChatGPT run −1), 3,442 citations, 786 map entities matched to the 280-venue coffee registry (285-row Apify seed expanded with Places recovery, filtered to coffee-led venues). 9 of 10 platform-proxy batches succeeded; AI Mode × FR failed at the Bright Data trigger (HTTP-level, no snapshot ID — the same FR-proxy block that hit Paris yoga; AI Mode × DE worked in Berlin, so the rejection is FR-specific). Taxonomy pass tightened: "other" 54% → 9% (Marseille's French local-blog tail bucketed cleanly). - HEADLINE — In Marseille, Instagram and local blogs beat café websites. Across Paris yoga, Berlin yoga and Amsterdam bikes, ChatGPT cited shop/studio websites ~32% (Paris 32%, Berlin 32%, Amsterdam 42%); Tokyo bookstores was already an exception at 8%. For Marseille specialty coffee: **ChatGPT cites shop websites only 10%** — instead 31% Reddit + 32% list & review aggregators + 14% local blogs = 77% third-party. Three reinforcing reasons: (1) tiny scene with weak digital footprint (only 114 of 280 venues carry a website); (2) Reddit + Instagram ARE the discovery layer (reddit.com 175 cites, instagram.com 237 cites); (3) Marseille's French local-blog ecosystem is unusually dense (marseille.love-spots.com 93, marseillesecrete.com 36, tarpin-bien.com 25, lescachotteriesdemarseille.com 11). Same engine personalities as the other cities, but the source mix bends to local commercial infrastructure — and here the "infra" is Reddit + Instagram + French micro-guides. - Three Marseille-only findings: 1. **Instagram at 237 cites across 3 platforms** — the highest single-domain count anywhere except AI Mode's Google firehose. No other vertical/city we've measured pushes Instagram this hard. Marseille specialty coffee is an Instagram-native scene and AI mirrors that. 2. **Gemini swings to global trade press**: baristamagazine.com gets 95 cites = ~34% of Gemini's Marseille citations (vs 12% on shop websites). When a city has no local specialty press, Gemini substitutes the global one — the general lesson: Gemini's preferred source profile is "an authoritative editorial outlet for this domain," and the choice of which editorial is locale-dependent. 3. **Two metrics, two leaderboards.** By text mentions (engine actually saying the brand in the visible answer), **Deep is the cross-engine consensus winner** — 216 mentions across all five engines, named in 61.5% of ChatGPT captures, 57.6% of Gemini, 48.9% of Copilot, 46.7% of Perplexity, 41.3% of AI Mode. The data-session's citation-counted score puts Nua at #1 with 356 cites but that's a brand_key artifact: Nua has no website, only Instagram, so all instagram.com cites attribute to it; in actual answer text Nua appears in just one capture. Same gap surfaced with Gérard Arnaud / Kind Yoga in the Paris yoga study. The article renders both metrics; the leaderboard rank order is the text-mention one. - Other-platform twists: Copilot still entity-engine but at 83% (vs the 95–97% norm — even Copilot is dragged down by Marseille's weak digital footprint); Perplexity leads with editorial_local 27% (loves the French local-guide ecosystem more than any other engine); AI Mode is its usual 80% self-referential google.com firehose, with the FR-proxy data missing. - Language divergence — MOST language-divergent result we've measured: EN vs FR control prompt = 11% top-5 overlap (vs Berlin/Paris control at 25%). With so little global specialty press covering Marseille, English prompts pull from a tiny English-language source set (Reddit, wanderlog, baristamagazine) while French prompts pull from the dense Marseille local-blog web. Two languages → two largely disjoint citation universes. - TLD bias: too thin to read — only .com cleared the ≥5-cite threshold (.com cited 4× more by FR than EN, n=10). Marseille shops use .fr so rarely (only ~22 of 280 have a website at all) that the .fr sample is too small. Render as measurement limitation, not finding. - Geography: district-targeting returned 0% across all 6 Marseille quartiers (Le Panier, Cours Julien, Notre-Dame-du-Mont, La Joliette, Vauban, Castellane) — same Amsterdam/Berlin SEED-GRANULARITY ARTIFACT (Apify seed labels city = "Marseille" with no quartier field). Null test, NOT "AI gets districts wrong." - Operational note (reproducible across runs): AI Mode × FR proxy is broken at the Bright Data trigger level. Berlin/DE worked, Paris-yoga/FR failed, Marseille/FR failed → the issue is FR-specific to AI Mode, not a one-off. ### 27. Are Hotels in Common Crawl? (2026) - URL: https://nicolassitter.com/research/hotels-in-common-crawl-2026 - Date: June 2026 - Topic: A census of hotel presence in Common Crawl — the open web archive behind much LLM training data. The training-data complement to the retrieval-side studies. - Summary: 108,109 distinct hotel domains (from 142,405 hotels with own site + Google Place ID + >=10 reviews) checked against the columnar URL index of the May 2026 snapshot (CC-MAIN-2026-21), matched on www-normalised host. - HEADLINE: 60.6% of hotels are in Common Crawl, 39.4% absent. Of those present, 41.4% have depth (>=5 captured pages), 19.2% are shallow (1-4). ~2 in 5 reviewed hotels with a real website are missing from the data that trains LLMs. - Independents 61% > chains 45.9% (chains more often JS-rendered or CDN/WAF-blocked). Country: DE 69% best -> GB 54%, ID 47% worst. TLD: .de 71% best, .es 37% worst; .com deepest (~109 pages) vs local TLDs shallow (~35-42) - the English-corpus tilt one layer earlier, in training data. - Why absent: JS-only rendering, CDN/WAF blocks (invisible to robots.txt, so the 3.3% block study is a floor), or low connectivity (Common Crawl rank / Harmonic Centrality, per Metehan Yesilyurt). Free checker at /tools/common-crawl. ### 28. AI Hotel Memory (2026) - URL: https://nicolassitter.com/research/ai-hotel-memory-2026 - Date: June 2026 - Topic: A parametric-recall study — what AI models have MEMORISED about hotels, with web search turned OFF. The complement to every retrieval-side study here. Method adapted from Dejan Marketing's AI Brand Authority Index, extended with website verification. - Summary: Three cheap models (GPT-5.4-nano, GPT-5.4-mini, Gemini 3.1 Flash-Lite) were asked repeatedly, in JSON, to name hotels and return each one's website + address — no tools, temperature 1.0. ~150 runs/model for global chains; ~80 runs/model for Paris, Dubai, London, New York (~1,400 generations total). Returned domains were checked for DNS resolution, turning recall into a confabulation lie-detector. Total cost under EUR 20. - HEADLINE: Hotel chains are known cold — every model returns Marriott/Hilton/Hyatt/Four Seasons with their correct websites ~99% of the time. For individual hotels it cracks: the share of named hotels with a working website ranges from 97% (Gemini 3.1 Flash-Lite, Dubai) down to 47% (GPT-5.4-nano, Paris). The signature failure mode is the model knowing a real hotel exists but INVENTING its web address — e.g. Le Bristol Paris returned as the dead bristolparis.com vs the real oetkercollection.com; Hotel Plaza Athenee as plazaathenee-paris.com vs the real dorchestercollection.com. - Key findings: (1) chains overlearned (single predictable domain in training); (2) Paris is the hardest city for every model because its top hotels are independent palaces on unpredictable collection domains, and it produces the longest tail of invented names (nano: 469 distinct "Paris hotels" over 80 runs vs Gemini's 177); (3) counter-intuitively the CHEAPEST model, Gemini 3.1 Flash-Lite, had the most accurate and most consistent hotel memory (beats GPT-5.4-mini in cities), which is why Dejan's brand index could run on cheap Gemini; (4) asking for the website is the method's key — it forces a checkable commitment a bare name does not. - Caveat: DNS-resolves is a lower bound on accuracy (a domain can resolve yet be the wrong hotel). Measures three specific cheap models, not the full ChatGPT/Gemini consumer products (which use larger models + web search). ## Tools ### Hotel Schema Audit & Generator - URL: https://nicolassitter.com/tools/hotel-schema - Free audit + generator for schema.org/Hotel JSON-LD. The audit fetches a hotel homepage's raw HTML server-side (no JS rendering — what GPTBot/ClaudeBot actually see), extracts the JSON-LD, and scores it 0–100 against ~40 hotel-specific rules (lodging @type gate, typed starRating, sameAs, address + ISO country, geo, checkin/checkout casing, amenityFeature, HotelRoom, and more) anchored to the hotel-schema-adoption-study research. The generator then opens pre-filled with the parsed values so only the gaps need typing; it also works standalone from scratch. - No email gate, no logging, submitted domains are not stored. Output is paste-and-deploy clean — no [TODO] placeholders, no leaky defaults (paymentAccepted is a real array, addressCountry is required, aggregateRating is opt-in only). ### Common Crawl Checker - URL: https://nicolassitter.com/tools/common-crawl - Free checker for whether a hotel website is in Common Crawl — the open web archive behind much LLM training data; queries the public CDX index across recent snapshots. No email gate, no logging. ## Live Dashboards ### My AI Visibility - URL: https://nicolassitter.com/projects/niche-visibility - Live weekly dashboard tracking whether five AI engines (ChatGPT, Perplexity, Gemini, Copilot, Google AI Mode) cite nicolassitter.com when answering questions in its own niche — AI search for hotels. A frozen 31-prompt panel (published in full at /data/niche-visibility/prompts.csv) fires every Tuesday via Bright Data; the dashboard shows citation/mention rates per engine, a tier × engine heatmap, a share-of-voice domain leaderboard, the cited pages, and an engine self-awareness grid. Every optimization action is logged publicly and overlaid on the charts. ## Blog ### Hotel Ranque: How We Built a Fully Booked Hotel Using Only AI Visibility - URL: https://nicolassitter.com/blog/hotel-ranque - Date: February 2026 - Summary: A GEO/AEO experiment with a real boutique hotel in Paris. Hotel Ranque went from zero AI visibility to fully booked using only structured content, third-party signals, and consistency. No ads, no OTA dependency. - Key findings: AI visibility compounds over weeks. First Perplexity mention at week 4, ChatGPT at week 8, fully booked by month 5. AI-driven bookings convert at higher rates than OTA traffic. ## Endpoints - Website: https://nicolassitter.com - Research hub: https://nicolassitter.com/research - API (JSON): https://nicolassitter.com/api/posts - RSS: https://nicolassitter.com/rss.xml - Sitemap: https://nicolassitter.com/sitemap.xml - Identity: https://nicolassitter.com/llms.txt ## Contact - LinkedIn: https://www.linkedin.com/in/nicolassitternolleau/ - GitHub: https://github.com/Nicositter88 - Email: nicolas.sitternolleau@gmail.com