{"@context":"https://schema.org","@type":"BlogPosting","headline":"AI Search for Specialty Coffee in Marseille (2026): Where the Entity-Engine Pattern Breaks","description":"Across Paris yoga, Berlin yoga and Amsterdam bikes, ChatGPT cited shop/studio websites ~32% of the time. For Marseille specialty coffee it's only 10% — instead 31% Reddit + 32% review-aggregators + 14% French local blogs = 77% third-party. 27 prompts × 5 AI engines × EN/FR × US/FR proxies, 413 captures, 3,442 citations and 786 map entities against 280 specialty cafés (9 of 10 platform-proxy batches succeeded; AI Mode × FR was rejected at the Bright Data trigger — same FR-specific block that hit Paris yoga). Three Marseille-only findings: Instagram at 237 cites across 3 platforms (the highest social signal we've measured), Gemini swinging to global specialty press (baristamagazine.com = 34% of its citations) when local trade press is absent, and a two-metric leaderboard split — Deep is the text-mention consensus winner (216 mentions across all five engines, 61.5% of ChatGPT) while Nua tops the cite-counted score (356) only because its brand_key is generic (instagram.com); in actual answer text Nua appears in one capture. The article renders both metrics; the leaderboard rank is the text-mention one. EN vs FR control prompt = 11% top-5 overlap, the most language-divergent result so far. The taxonomy pass cleaned up the Marseille local-blog tail: \"other\" bucket dropped from 54% to 9%.","datePublished":"2026-05-28","dateModified":"2026-05-28","url":"https://nicolassitter.com/research/specialty-coffee-marseille-ai-search-2026","category":"research","keywords":["AI search specialty coffee Marseille","ChatGPT coffee Marseille","Nua Marseille AI","Instagram AI citations","entity engine break","Reddit AI source","baristamagazine Gemini","French local blogs AI","AI local search study"],"articleSection":"Research","wordCount":3100,"readTime":"13 min","articleBody":"May 2026AI Search Studies\n\n# AI Search for Specialty Coffee in Marseille (2026):Marseille breaks the entity-engine pattern\n\n**TL;DR:** Across our four prior cross-vertical studies (Paris yoga, Berlin yoga, Amsterdam bikes, Tokyo bookstores) ChatGPT cited shop or studio websites about 32% of the time. For Marseille specialty coffee, it’s **only 10%**. Instead, 77% of ChatGPT’s Marseille citations go to third parties — **31% Reddit, 32% review-aggregators, 14% French local blogs**. The entity-engine assumption inverts here. Instagram returns 237 cites — the highest social-platform signal in any cross-vertical study so far. By the “does the engine actually _say_ the brand name in the answer” metric, **Deep is the consensus winner** — named in 61.5% of ChatGPT captures and across all five engines. Marseille is the city where the local-blog and Instagram ecosystem _is_ the AI search ecosystem.\n\nPublished May 28, 2026\n\n5\n\nAI engines\n\n27\n\nPrompt templates\n\n280\n\nCafés\n\nEN + FR\n\nBoth languages\n\n[Read the Report](#executive-summary)\n\n[Summary](#executive-summary)[1\\. Source Mix](#source-mix)[2\\. ChatGPT Inversion](#inversion)[3\\. Instagram](#instagram)[4\\. Gemini’s Substitute](#gemini-press)[5\\. Shop Leaderboard](#leaderboard)[6\\. Top Sources](#top-sources)[7\\. Language Split](#language)[8\\. TLD Limit](#tld)[9\\. Geography](#geography)[10\\. Vs. 4 Cities](#four-cities)[11\\. AI Behaviour](#ai-behaviour)[Methodology](#methodology)[FAQ](#faq)\n\n## Executive Summary\n\nFor the first time across four cross-vertical replications, the entity-engine assumption breaks. Marseille specialty coffee runs on third-party sources — not shop websites.\n\nWe’ve now run the same AI-search playbook across four cities and verticals: Paris yoga, Berlin yoga, Amsterdam bikes and Tokyo bookstores. In all four, ChatGPT cited the entity’s own website roughly **32% of the time** — the consistent ChatGPT-as-entity-engine baseline. For Marseille specialty coffee, that number collapses to **10%**. The missing 22 points don’t go to a single source — they go to a trio: **31% Reddit, 32% review-aggregators (Wanderlog, Mapstr), 14% French local blogs** (marseille.love-spots.com, marseillesecrete.com, tarpin-bien.com, lescachotteriesdemarseille.com). That’s 77% of ChatGPT’s Marseille answers grounded in third-party content.\n\nThe mechanism is the same one Berlin showed: source mix bends toward the local commercial and editorial infrastructure of a city. Marseille just has an unusual infrastructure — a small specialty-coffee scene with a weak digital footprint (only 114 of 280 shops have a website at all), a heavy Instagram-native discovery layer (237 cites, our highest social signal anywhere), and a dense thicket of French micro-guides covering the city. AI engines mirror what’s out there. Where the documentation lives, the citations follow.\n\nSection 1\n\n## Source mix by platform\n\nFor every cited URL we bucketed the source into one of ten categories. The mix per engine is the cleanest way to see why Marseille breaks the pattern — especially compared to the consistent shop/studio-website share we measured in three earlier cities.\n\nsource-mix-by-platform-marseille-coffee\n\nCopilot\n\n### Still entity-engine — but at 83%\n\nThe 95–97% Copilot norm we measured in yoga and bikes drops to **83%** in Marseille. Even the engine that goes hardest on shop websites loses 12 points here. The weak digital footprint of Marseille specialty coffee — most shops without a website — forces Copilot to spend the rest of its citations on the long tail it usually ignores.\n\nGemini\n\n### Swings to global trade press: 34%\n\nWith no Marseille-specific specialty-coffee press to lean on, Gemini reaches for the global trade press instead. `baristamagazine.com` alone returned 95 citations — more than any shop website on Gemini, and roughly **34% of Gemini’s Marseille citations**.\n\nPerplexity\n\n### Leads with local editorial: 27%\n\nPerplexity leans on the French local-guide ecosystem more than any other engine — **27%** of its Marseille citations go to `marseille.love-spots.com`, `marseillesecrete.com`, `tarpin-bien.com` and similar French micro-guides. Where Marseille is documented in French, Perplexity finds it.\n\nThe mechanism is the same one Berlin showed: source mix bends to local commercial and editorial infrastructure. In Berlin that infrastructure was a fitness-marketplace blog (Urban Sports Club). Here in Marseille the “infrastructure” is a Reddit thread, an Instagram feed and a thicket of French micro-guides — so the AI engines mirror exactly that. The platform personalities don’t change; the substrate does.\n\nSection 2\n\n## The ChatGPT inversion\n\nOne number tells the whole story of this article. ChatGPT’s shop-website citation share for Marseille specialty coffee is the lowest we’ve measured in five cross-vertical replications — and the gap to the other “city + scene” cases is what makes it interesting.\n\n10%\n\nChatGPT’s share of citations going to specialty-coffee shop websites in Marseille.\n\nvs the ~32% baseline we measured in Paris yoga, Berlin yoga and Amsterdam bikes.\n\nchatgpt-shop-website-share-by-city\n\nChatGPT shop/studio website citation share across five cross-vertical AI-search studies.\n\nCity + vertical\n\nChatGPT shop-site %\n\nSignature non-shop source\n\nParis yoga\n\n32%\n\nReddit (16% of all ChatGPT URLs)\n\nBerlin yoga\n\n32%\n\nUrban Sports Club blog\n\nAmsterdam bikes\n\n42%\n\nReddit\n\nTokyo bookstores\n\n8%\n\nwhenin.tokyo + Tokyo Weekender\n\nMarseille coffee\n\n10%\n\nReddit + Instagram + FR local blogs\n\nThis isn’t a measurement artifact. Tokyo bookstores landed at 8% for exactly the same reason Marseille lands at 10%: when a scene’s documentation lives off-site — on third-party review aggregators, in local-guide blogs, or on social — the AI engines find it there. The entity-engine baseline of ~32% is what happens when shops _do_ own their digital footprint. In Marseille, only 114 of 280 specialty cafés have a website at all. The 10% is honest.\n\nSection 3\n\n## The Instagram signal\n\nOne domain stands out so far above the rest of the social layer that it deserves its own section.\n\n237\n\nInstagram citations\n\nacross 3 platforms — the highest single-domain count anywhere in this study except for AI Mode’s Google self-citation firehose.\n\n### No other city/vertical pushes Instagram this hard.\n\nIn Paris yoga, Berlin yoga, Amsterdam bikes and Tokyo bookstores, the dominant social signal was Reddit — Instagram barely registered (single digits to low double digits at most). For Marseille specialty coffee, Instagram joins Reddit as a co-equal source: Reddit at 175 cites, Instagram at 237.\n\nMarseille specialty coffee is an **Instagram-native scene**. Many of the top shops document themselves there in preference to a website — **Nua** is the cleanest example: no website, only an Instagram account, and ChatGPT’s answers cite that Instagram URL as the primary source. AI engines reflect this by surfacing Instagram URLs as primary citations rather than social-proof addenda. If you’re ranking a Marseille café for AI visibility, “own your Instagram” is closer to truth than “own your website.”\n\nThe reverse holds too: in cities where Instagram _doesn’t_ drive citations, optimising it as your primary AI-discovery surface would underperform. The platform that matters depends on where the city’s scene actually documents itself — not on a generic GEO playbook.\n\nSection 4\n\n## Gemini’s substitute\n\nWhen the local layer Gemini usually leans on doesn’t exist, it reaches one rung up.\n\n95\n\n`baristamagazine.com` citations on Gemini\n\n~34% of Gemini’s entire Marseille citation pool — more than any shop website.\n\n### The generalisable lesson\n\nGemini’s source preference is consistent across our studies, but it’s a _shape_ more than a fixed set: “an authoritative editorial outlet for this domain.” The identity of that outlet is locale-dependent.\n\n-   ·**Berlin yoga** → the Urban Sports Club blog filled the slot.\n-   ·**Tokyo bookstores** → the local-guide web (whenin.tokyo, Tokyo Weekender) filled it.\n-   ·**Marseille coffee** → with neither a deep local-guide layer nor a Marseille-specific specialty-coffee press, Gemini reaches for the _global_ specialty trade press.\n\nFor a domain-aware editorial outlet, the strategic implication is the inverse of the local-blog story: in cities where no local equivalent exists, a global trade publication can become the single largest Gemini citation source for that locale. Barista Magazine isn’t “about” Marseille — but on Gemini, for Marseille, it’s the source.\n\nSection 5\n\n## The Marseille Specialty Coffee AI Leaderboard\n\nAggregating shop mentions across the engines (chain locations merged), these are the most-cited Marseille specialty cafés. “Plats” is the number of platforms that surfaced the shop at all — a breadth signal.\n\nai-favourite-marseille-coffee-shops-2026\n\nTop 12 Marseille specialty cafés ranked by text-mention total (engines naming the brand in the visible answer text), with the data session's citation-counted score alongside for comparison. Engines = number of the five engines that named the brand at least once in answer text. Brand-aggregated — AI answers say 'Café Lauca,' not 'Café Lauca Vieux-Port,' so the relevant unit is the brand, not the address.\n\nRank\n\nShop\n\nText mentions\n\nCite score\n\nEngines\n\n#1\n\nDeep\n\n216\n\n131\n\n5 / 5\n\n#2\n\n7VB Café\n\n147\n\n66\n\n5 / 5\n\n#3\n\nCafé Lauca\n\n116\n\n101\n\n5 / 5\n\n#4\n\nBoujou Coffee\n\n105\n\n54\n\n5 / 5\n\n#5\n\nLa Brûlerie MÖKA\n\n91\n\n38\n\n5 / 5\n\n#6\n\nThe Coffee\n\n50\n\n34\n\n5 / 5\n\n### The same leaderboard, split by engine\n\nReading across a row: the share of each engine’s prompt-captures whose visible answer text names the brand (raw count in parentheses). Because each engine answered a different number of prompts (ChatGPT 91, Gemini / Copilot / Perplexity 92, AI Mode 46 — AI Mode × FR failed at the Bright Data trigger), raw counts aren’t comparable across columns, so we render them as rates.\n\nDeep dominates the heatmap: named in **61.5% of ChatGPT’s captures**, 57.6% of Gemini, 48.9% of Copilot, 46.7% of Perplexity and 41.3% of AI Mode — universal recommendation. 7VB Café and Boujou Coffee are similar broad winners. The two rows at the bottom (**Nua**, **Le grand Duc by Jacks**) are empty — they have the high citation-counted scores in the table above but engines almost never name them in answer text. Both have generic platform _brand\\_keys_ (Nua → instagram.com, Le grand Duc → facebook.com), so the citation count inherits every cite to those platforms and inflates. By the “does the engine actually _say_ the name” metric, Deep is Marseille’s real cross-engine consensus winner.\n\n#\n\nShop\n\nAI Mode46 prompts\n\nChatGPT91 prompts\n\nPerplexity92 prompts\n\nGemini92 prompts\n\nCopilot92 prompts\n\n1\n\nDeep\n\n41.3%(19)\n\n61.5%(56)\n\n46.7%(43)\n\n57.6%(53)\n\n48.9%(45)\n\n2\n\n7VB Café\n\n39.1%(18)\n\n34.1%(31)\n\n28.3%(26)\n\n45.7%(42)\n\n32.6%(30)\n\n3\n\nCafé Lauca\n\n30.4%(14)\n\n46.2%(42)\n\n12%(11)\n\n5.4%(5)\n\n47.8%(44)\n\n4\n\nBoujou Coffee\n\n39.1%(18)\n\n41.8%(38)\n\n15.2%(14)\n\n14.1%(13)\n\n23.9%(22)\n\n5\n\nLa Brûlerie MÖKA\n\n17.4%(8)\n\n22%(20)\n\n20.7%(19)\n\n38%(35)\n\n9.8%(9)\n\n6\n\nThe Coffee\n\n8.7%(4)\n\n13.2%(12)\n\n8.7%(8)\n\n9.8%(9)\n\n18.5%(17)\n\n7\n\nTarlata Café\n\n4.3%(2)\n\n11%(10)\n\n19.6%(18)\n\n1.1%(1)\n\n10.9%(10)\n\n8\n\nCafé Barbotyne\n\n8.7%(4)\n\n6.6%(6)\n\n7.6%(7)\n\n0(0)\n\n16.3%(15)\n\n9\n\nCali Kitchen\n\n17.4%(8)\n\n6.6%(6)\n\n2.2%(2)\n\n1.1%(1)\n\n14.1%(13)\n\n10\n\nMaison Nosh\n\n4.3%(2)\n\n5.5%(5)\n\n5.4%(5)\n\n3.3%(3)\n\n13%(12)\n\n11\n\nNua\n\n0(0)\n\n0(0)\n\n0(0)\n\n1.1%(1)\n\n0(0)\n\n12\n\nLe grand Duc by Jacks\n\n0(0)\n\n0(0)\n\n0(0)\n\n0(0)\n\n0(0)\n\nCell = % of that engine’s captures in which the brand was named in the visible answer text; raw count in parentheses. Rows ordered by total text mentions (matching the leaderboard above). Colour scales with the table maximum (Deep on ChatGPT). Zeros greyed. Hover a cell for the underlying counts.\n\n**Two metrics, two stories — and one is wrong.** The data-session leaderboard scored Nua at 356 cites and Le grand Duc at 31, putting them at #1 and #10 by citations. Neither shop has a website — their _brand\\_key_ is the generic platform they live on (instagram.com, facebook.com), so the citation aggregator hands them every cite to those domains in the corpus. By the honest metric — how often does the engine say the brand name in the answer? — Nua appears in _one_ capture and Le grand Duc in _zero_. The cite-score column above is kept as a contrast so the gap is visible; the rank order is the text-mention one. (The same measurement gap surfaced with Gérard Arnaud and Kind Yoga in the Paris yoga study.)\n\n### Where the winners are\n\nThe 12 leaderboard brands plotted on the map — chains show all their locations (Café Lauca, Boujou and La Brûlerie MÖKA have two each). Click a marker for the per-engine text-mention breakdown.\n\n#### Top 12 most-cited brands — locations, popups show per-engine text mentions\n\n1Deep27VB Café3Café Lauca4Boujou Coffee5La Brûlerie MÖKA6The Coffee7Tarlata Café8Café Barbotyne9Cali Kitchen10Maison Nosh11Nua12Le grand Duc by Jacks\n\nSection 6\n\n## Top cited sources, cross-platform\n\nEvery domain cited by at least 3 of 5 platforms with 30+ cites. Read this as the working infrastructure of AI search for Marseille specialty coffee.\n\nTop cross-platform cited domains for Marseille specialty coffee, ≥3 platforms and ≥30 cites.\n\nPlatforms\n\nCites\n\nBucket\n\nDomain\n\n4 / 5\n\n175\n\nsocial\n\nreddit.com\n\n4 / 5\n\n93\n\neditorial — local\n\nmarseille.love-spots.com\n\n4 / 5\n\n76\n\nreview aggregator\n\nwanderlog.com\n\n4 / 5\n\n54\n\nentity website\n\ncafelauca.com\n\n4 / 5\n\n36\n\neditorial — local\n\nmarseillesecrete.com\n\n3 / 5\n\n237\n\nsocial\n\ninstagram.com\n\n**instagram.com at 237 cites is the row to notice.** It shows up on 3 of 5 platforms with the highest single-domain count outside AI Mode’s Google self-citations — and no prior city or vertical in this series has surfaced Instagram even close to that volume. Reddit (175 cites, 4 platforms) is the more “expected” social signal; Instagram is the Marseille-specific one.\n\nSection 7\n\n## EN vs FR — the sharpest split we’ve seen\n\nSame intent, different language, different shops. For Marseille the divergence is the largest in the cross-vertical series so far.\n\nEN vs FR top-5 shop overlap per prompt template, ChatGPT, FR proxy.\n\nTemplate\n\nEN vs FR top-5 overlap\n\ndist\\_cours\\_julien\n\n67%\n\nprice\\_cheap\n\n43%\n\ndist\\_panier\n\n43%\n\nbrew\\_pourover\n\n25%\n\nbrew\\_espresso\n\n25%\n\nbrew\\_coldbrew\n\n25%\n\nThe **control prompt overlaps just 11%** — only 1 of the top 5 shops matches between English and French. For comparison, Berlin’s control was 25% and Paris yoga’s was 25%. Marseille is roughly half as language-stable as the previous two non-anglophone replications.\n\nThe cause is structural and follows directly from the source mix: with so little global specialty-coffee press covering Marseille, English prompts pull from a small English-language source set (Reddit threads, Wanderlog, Barista Magazine) while French prompts pull from the dense Marseille local-blog web (marseille.love-spots.com, marseillesecrete.com, tarpin-bien.com). Two languages, two largely **disjoint citation universes** — and two largely disjoint top-5 shop lists as the downstream consequence.\n\nFor a Marseille café, this is the most actionable single finding in the study: an EN-only or FR-only AI visibility audit will miss roughly 89% of the visibility picture on the control prompt. The two languages are pulling from different libraries.\n\nSection 8\n\n## TLD bias — honestly, the sample is too thin\n\nWe’d normally render an EN vs FR split of .com vs .fr citations here. For Marseille specialty coffee, the sample doesn’t support it.\n\nOnly **.com cleared the ≥5-cite threshold** — and even there the directional signal (.com cited about 4× more by FR than EN prompts, n=10) sits below the noise floor we trust. Marseille specialty cafés use `.fr` so rarely — only ~22 of 280 shops have a website at all — that the .fr sample is too small to read cleanly. This is a measurement limitation, not a finding. We’re flagging it transparently rather than over-claiming a pattern the data can’t support.\n\nSection 9\n\n## Geography — and a null test that is a measurement story\n\nMarseille’s specialty cafés cluster heavily in a handful of central quartiers. The visual below shows the by-district counts — but the per-district AI accuracy test reads as zero, and that zero is methodological, not behavioural.\n\n### All 280 cafés on the map\n\nThe full reference set the answers were resolved against — every verified Marseille specialty café. Hover a dot for the name and district. The density follows the central waterfront and creative-quarter neighbourhoods (Vieux-Port, Notre-Dame-du-Mont, Cours Julien) rather than the outlying arrondissements.\n\n#### All 280 verified Marseille specialty cafés\n\n### The supply side: where Marseille coffee actually is\n\nThe same distribution as counts — central quartiers dominate, the outlying arrondissements barely show up.\n\n![Bar chart of specialty cafe counts by Marseille district showing concentration in central quartiers including Le Panier, Cours Julien, Notre-Dame-du-Mont and the Vieux-Port area.](/_next/image?url=%2Fresearch%2Fspecialty-coffee-marseille-ai-search-2026%2Fcoffee_marseille_by_district.png&w=3840&q=75&dpl=dpl_H4iFpRn3vQ7W7vEpqXGjb44m5WjB)\n\nSpecialty café counts across Marseille’s central quartiers, from the 280-shop registry.\n\n**District-targeting returned 0% accuracy across all six tested quartiers** (Le Panier, Cours Julien, Notre-Dame-du-Mont, La Joliette, Vauban, Castellane). This is the same **seed-granularity artifact** we hit in Amsterdam and Berlin: the Apify Google Maps seed labels every shop’s city as “Marseille” with no quartier field, so a shop returned by an AI engine can never string-match a quartier target in our registry.\n\nThis is **not** evidence that AI gets Marseille quartiers wrong. It’s a seed-data ceiling we’re hitting in our own measurement — the kind of null that has to be rendered as a methodology caveat rather than a finding. The Paris yoga study, with its arrondissement-tagged registry, was the one case where we could measure this property cleanly.\n\nSection 10\n\n## Where Marseille fits in the four-city picture\n\nOne row per city in the cross-vertical series, on the two axes that matter most for the entity-engine pattern: ChatGPT’s shop-website citation share, and the signature non-shop source the engine leans on instead.\n\nFive cross-vertical replications of the AI-search methodology, scored on ChatGPT’s entity-engine vs third-party split.\n\nCity + vertical\n\nChatGPT shop-site %\n\nSignature non-shop source\n\nParis yoga\n\n32%\n\nReddit\n\nBerlin yoga\n\n32%\n\nUrban Sports Club\n\nAmsterdam bikes\n\n42%\n\nReddit\n\nTokyo bookstores\n\n8%\n\nwhenin.tokyo + Tokyo Weekender\n\nMarseille coffee\n\n10%\n\nReddit + Instagram + French local blogs\n\n### What still generalises\n\n-   **Per-engine personalities hold.** Copilot is still the entity engine; Perplexity still the most diverse mix; Gemini still concentrates on one editorial vein; AI Mode still self-cites Google.\n-   **Source mix bends to local infrastructure.** Berlin (Urban Sports Club), Tokyo (local-guide web), Marseille (Reddit + Instagram + FR micro-guides) — same mechanism, different substrate.\n-   **Language and proxy reshape the answer.** EN/FR divergence widens further here, but the direction is the same as Paris and Berlin.\n\n### What Marseille adds\n\n-   **The entity-engine baseline isn’t a law.** The ~32% ChatGPT shop-site share we’d been treating as a constant is conditional on shops _owning_ a digital footprint. When they don’t, it can drop to 10%.\n-   **Instagram can be a primary AI citation source.** Not just social proof — a top-3 cited domain on 3 of 5 engines.\n-   **Gemini will substitute upward.** No local specialty press → global trade press fills the slot. The shape of the source preference is fixed; the identity isn’t.\n\n[Read: AI Search for Yoga Studios in Paris](/research/yoga-studios-paris-ai-search-2026) [All AI Search Studies](/research?topic=ai-search)\n\nSection 11\n\n## AI search behaviour — and an operational caveat\n\nThe plumbing notes for this dataset, including a reproducible Bright Data quirk that bit two of our studies in a row.\n\n413\n\nCaptures across 9 of 10 platform×proxy batches. Roughly even per-engine, weighted toward platforms with multiple language/proxy combinations that resolved.\n\n3,442\n\nCited URLs logged, bucketed into the 10-bucket source taxonomy. Marseille’s French local-blog tail bucketed cleanly — “other” tightened from 54% (pre-taxonomy pass) to 9%.\n\n786\n\nDistinct map entities surfaced and resolved through the NER pipeline to the 280-shop registry.\n\n**AI Mode × FR proxy failed at the Bright Data trigger.** HTTP-level rejection, no snapshot ID returned. This is the _same_ FR-proxy block that hit our Paris yoga study in May; AI Mode × DE worked cleanly in the Berlin replication, so the rejection is FR-specific to AI Mode — not a one-off scrape failure. Every AI Mode percentage on this page is over the US-proxy batch only. We flag the absence rather than imputing French numbers.\n\nMethodology\n\n## Study design\n\n### Data collection\n\n-   22 prompt templates × 2 languages (EN/FR) × 2 proxy countries (US/FR) × 5 AI engines\n-   Engines: ChatGPT, Perplexity, Gemini, Copilot, Google AI Mode\n-   Captured 2026-05-28 via Bright Data\n-   413 captures, 3,442 cited URLs, 786 distinct map entities\n-   9 of 10 platform×proxy batches resolved — AI Mode × FR rejected\n\n### What we measured\n\n-   Shops named per answer (chain-aggregated leaderboard)\n-   Cited URLs bucketed into a 10-bucket source taxonomy\n-   EN vs FR top-5 overlap per prompt template\n-   .com vs .fr citation balance by prompt language (sample-limited)\n-   Per-platform social and review-aggregator share\n-   Cross-city comparison vs Paris yoga, Berlin yoga, Amsterdam bikes, Tokyo bookstores\n\n### How we turned answers into shops: the NER pipeline\n\nAI answers are free text — “Nua is the obvious one, then maybe try Deep in the Vieux-Port, or Tarlata…” — not a clean list of businesses. To count anything, we first had to extract the shop mentions. Each answer (and each citation’s anchor text) ran through a **named-entity-recognition (NER)** pass that works in four steps:\n\n1.  **Span detection.** A transformer NER model tags candidate spans — organisation/business names and the location phrases attached to them (quartier names, street addresses, “Vieux-Port,” “Cours Julien”). A coffee-specific gazetteer (“café,” “brûlerie,” “coffee,” “roasters”) boosts recall on names the base model would otherwise miss.\n2.  **Normalisation.** Each candidate is lower-cased and stripped of boilerplate — the word “Marseille,” quartier suffixes, trademark glyphs and punctuation — so “Nua Coffee Roasters Marseille” and “Nua” collapse toward the same key.\n3.  **Entity resolution.** Normalised mentions are matched to a 285-row Apify Google Maps seed of Marseille specialty cafés (280 retained after filtering bakeries, restaurants and tea houses — only 5 misclassifications). Fuzzy string similarity plus a domain match when the answer cited the shop’s own website. Ambiguous or sub-threshold spans are dropped.\n4.  **Chain aggregation.** Resolved entities that belong to the same brand are merged so a multi-location shop isn’t double-counted, while per-location coordinates are retained for the map.\n\nA taxonomy pass also tightened the source bucketing: pre-pass, “other” held **54%** of Marseille citations because the French local-blog tail was unfamiliar to the default classifier. Post-pass, “other” sits at **9%** — the French micro-guide ecosystem bucketed cleanly into editorial\\_local.\n\n### Caveats\n\n-   **AI Mode × FR proxy is missing.** Bright Data rejected the batch at the trigger layer (HTTP-level, no snapshot ID). The same FR-proxy block hit Paris yoga; AI Mode × DE worked in Berlin. FR-specific to AI Mode, not a one-off.\n-   **.fr TLD sample is too thin to read.** Only ~22 of 280 Marseille specialty cafés carry a website at all; .fr citation counts sit below our trust threshold. Section 8 is rendered as a measurement limitation, not a finding.\n-   **District-targeting null is a seed-granularity artifact.** The Apify seed labels every shop city = “Marseille” with no quartier field; the 0% accuracy across all six quartiers is methodological, not behavioural. Same Amsterdam/Berlin pattern.\n-   NER resolution is high-precision by design: mentions the pipeline can’t confidently map to the registry are dropped, so counts are conservative lower bounds.\n-   Google AI Mode’s heavy reliance on google.com URLs may inflate its citation count relative to other engines.\n-   **Disclosure:** no personal affiliation with any Marseille specialty coffee shop.\n\nFAQ\n\n## Frequently Asked Questions\n\n### Summarize with AI\n\n## Continue Reading\n\nMore on how AI search surfaces local businesses.\n\n[AI Search for Yoga Studios in Paris (2026)](/research/yoga-studios-paris-ai-search-2026)[All AI Search Studies](/research?topic=ai-search)\n\n[All Research](/research)","author":{"@type":"Person","name":"Nicolas Sitter","url":"https://nicolassitter.com/about","sameAs":["https://www.linkedin.com/in/nicolassitternolleau/","https://github.com/Nicositter88","https://hotelrank.ai"]},"publisher":{"@type":"Person","name":"Nicolas Sitter","url":"https://nicolassitter.com"},"image":"https://nicolassitter.com/api/og/specialty-coffee-marseille-ai-search-2026","mainEntityOfPage":{"@type":"WebPage","@id":"https://nicolassitter.com/research/specialty-coffee-marseille-ai-search-2026"},"tags":["AI Search","Local Search","Specialty Coffee","Marseille","Instagram","Reddit"],"sameAs":["https://hotelrank.ai/research/specialty-coffee-marseille-ai-search-2026"],"alternateFormat":{"html":"https://nicolassitter.com/research/specialty-coffee-marseille-ai-search-2026","json":"https://nicolassitter.com/api/post/specialty-coffee-marseille-ai-search-2026","rss":"https://nicolassitter.com/rss.xml"},"datasets":[{"name":"summary","contentUrl":"https://nicolassitter.com/data/specialty-coffee-marseille-ai-search-2026/summary.csv","encodingFormat":"text/csv"}]}