{"@context":"https://schema.org","@type":"BlogPosting","headline":"How Dirty Is Google Maps Hotel Data? — 179K Listing Study","description":"179K Google Maps hotel listings across 11 countries. 17% fail QA. 8,167 are OYO vacation rentals. Belgium loses 54% after cleaning.","datePublished":"2026-04-01","dateModified":"2026-04-01","url":"https://nicolassitter.com/research/google-maps-hotel-data-quality-2026","category":"research","keywords":["Google Maps hotels","data quality","Google Places hotels","ChatGPT hotel data"],"articleSection":"Research","wordCount":6400,"readTime":"25 min","articleBody":"[Back to Research](/research)\n\nResearch\n\n# How Dirty Is Google Maps  \nHotel Data?\n\nWe analyzed 179K Google Maps hotel listings across 11 countries. 17% fail basic quality checks. 8,167 are OYO vacation rentals. Belgium loses 54% of listings after cleaning. And this is what powers ChatGPT maps in the UI recommendations.\n\n179K\n\nHotels Analyzed\n\n17%\n\nFail QA\n\n8.2%\n\nZero Reviews\n\n## TL;DR\n\nGoogle Maps lets anyone create a \"Hotel\" listing with barely any verification — if you have a website that looks like a hotel and type an address, you can be live in minutes. The result: **16.4% of listings fail basic quality checks**, 7.8% have zero reviews, and a single company — OYO — has polluted the dataset with **8,167 vacation rentals disguised as hotels**. In Belgium, 37% of \"hotels\" on Google Maps are Belvilla holiday homes. This matters because Google Maps is the **primary data source for ChatGPT, Gemini, and Perplexity** hotel recommendations. Dirty data in, dirty recommendations out.\n\nBy **Nicolas Sitter**|April 2026|178,647 listings across 11 countries\n\n## Summary\n\nWe analyzed **178,647 Google Maps listings** categorized as hotels across 11 countries. After deduplication, we worked with **148,923 unique listings**. Applying progressive quality filters — website presence, review count, address validation, domain checks — reduced the clean dataset to **124,537 listings (83.6%)**. The remaining 16.4% are noise: vacation rentals, restaurants, zero-review placeholders, and OTA redirect pages.\n\nThe single largest source of pollution is **OYO's European vacation rental brands**: Belvilla (5,047 listings) and Traum-Ferienwohnungen (3,120 listings) together account for 8,167 fake hotel entries — 5.5% of the entire dataset. In Belgium, Belvilla alone is 37.3% of all \"hotel\" listings. These properties have an average of 0.1 Google reviews.\n\nThis matters for AI. Google Maps is the primary data source powering hotel recommendations in ChatGPT (88.8% of map entities), Gemini, and Perplexity. When a user asks \"best hotels in Brussels,\" the AI draws from a pool where over a third of \"hotels\" are sheep farms and holiday apartments.\n\n16.4%\n\nFail QA\n\n8,167\n\nOYO Fake Hotels\n\n7.8%\n\nZero Reviews\n\n54%\n\nBelgium Drop Rate\n\n## Raw Data Problems\n\nBefore even looking at content quality, the raw data has structural issues. Here are the problems we found across 148,923 deduplicated listings.\n\n### Zero Reviews\n\n11,688\n\n7.8% of all listings\n\n### Under 10 Reviews\n\n21,272\n\n14.3% of all listings\n\n### Non-Hotel Website\n\n11,510\n\n7.8% link to OTAs, social, etc.\n\n### Review Count Distribution\n\nReview Count Distribution\n\nReview Bucket\n\nListings\n\nShare\n\n0 reviews\n\n11,688\n\n7.8%\n\n1-5\n\n6,034\n\n4.1%\n\n6-10\n\n3,550\n\n2.4%\n\n11-25\n\n8,662\n\n5.8%\n\n26-50\n\n10,557\n\n7.1%\n\n51-100\n\n15,631\n\n10.5%\n\n101-500\n\n56,739\n\n38.1%\n\n501-1,000\n\n19,845\n\n13.3%\n\n1,000+\n\n16,217\n\n10.9%\n\nRating Distribution\n\nRating Range\n\nListings\n\nShare\n\n0.0-2.9\n\n1,902\n\n1.4%\n\n3.0-3.4\n\n4,641\n\n3.4%\n\n3.5-3.9\n\n14,682\n\n10.7%\n\n4.0-4.2\n\n24,586\n\n17.9%\n\n4.3-4.5\n\n40,203\n\n29.3%\n\n4.6-4.8\n\n36,918\n\n26.9%\n\n4.9-5.0\n\n14,303\n\n10.4%\n\nThe sweet spot is 101-500 reviews (38.1%).\n\nMost legitimate hotels fall in the 101-500 review range. The long tail below 10 reviews (14.3%) is disproportionately fake listings, vacation rentals, and newly created placeholder profiles.\n\n## Case Study: Belvilla & OYO\n\nOYO Rooms (India) acquired two European vacation rental platforms in 2019: **Belvilla** and **Traum-Ferienwohnungen**. Both list individual holiday homes on Google Maps categorized as \"Hotel.\" They are the single largest source of non-hotel pollution in our dataset.\n\n### OYO Vacation Rental Brands on Google Maps\n\nBelvilla by OYO\n\nVacation rental marketplace\n\n5,047\n\nfake hotel listings\n\n0.1\n\navg Google reviews\n\nDomains: belvilla.nl, belvilla.de, belvilla.es, belvilla.com\n\nTraum-Ferienwohnungen\n\nVacation rental marketplace (Germany)\n\n3,120\n\nfake hotel listings\n\n0.2\n\navg Google reviews\n\nDomains: traum-ferienwohnungen.de\n\n8,167\n\nTotal OYO vacation rentals listed as \"Hotel\" — 5.5% of the dataset\n\n### Belvilla Pollution by Country\n\nThe damage is concentrated in small markets. In Belgium, more than a third of all \"hotel\" listings on Google Maps are Belvilla vacation rentals. In the Netherlands, it's nearly one in five.\n\nBelvilla Listings as % of All Google Maps Hotels\n\nCountry\n\nBelvilla Listings\n\nTotal Hotels\n\nBelvilla %\n\nBelgium\n\n1,755\n\n4,710\n\n37.3%\n\nNetherlands\n\n1,379\n\n7,245\n\n19%\n\nAustria\n\n516\n\n9,431\n\n5.5%\n\nSpain\n\n760\n\n23,952\n\n3.2%\n\nGermany\n\n568\n\n33,360\n\n1.7%\n\nSwitzerland\n\n65\n\n5,383\n\n1.2%\n\nItaly\n\n3\n\n37,861\n\n0%\n\n### What These Listings Look Like\n\nThese are real Google Maps \"hotel\" listing titles from Belgium:\n\n-   Huisje op schapenboerderij met gelateria - Belvilla by Oyo\n-   Vakantiehuis in Virton met privezwembad - Belvilla by Oyo\n-   Modern vakantiehuis in Senzeille met tuin - Belvilla by Oyo\n-   Heerlijk vakantiehuis in Libramont-Chevigny met tuin - Belvilla by Oyo\n\nThese are individual holiday homes — sheep farms, garden cottages, pool villas — categorized as \"Hotel\" with 0-1 Google reviews.\n\nBelvilla Domain Breakdown\n\nDomain\n\nListings\n\nbelvilla.nl\n\n3,123\n\nbelvilla.de\n\n1,081\n\nbelvilla.es\n\n760\n\nbelvilla.com\n\n70\n\nbelvilla.fr\n\n8\n\nbelvilla.it\n\n5\n\nA single review threshold eliminates 99.9% of Belvilla.\n\nRequiring just > 10 reviews drops Belvilla from 5,047 listings to 3. At > 50, it drops to zero. This is the strongest evidence that review count is the single most effective quality filter for Google Maps hotel data.\n\n## Beyond OYO: Other Data Polluters\n\nOYO is the biggest offender, but 7.8% of all listings (11,510) have non-hotel website domains. These fall into five categories.\n\nNon-Hotel Website Types in Google Maps Hotel Data\n\nType\n\nExamples\n\nCount\n\nShare\n\nVacation rental platforms\n\nbelvilla.nl, traum-ferienwohnungen.de\n\n8,155\n\n5.5%\n\nRedirects & aggregators\n\ntripcombined.com, traveleto.com, google.com\n\n1,603\n\n1.1%\n\nOTA pages\n\nbooking.com, expedia.com, tripadvisor.com\n\n699\n\n0.5%\n\nSocial media\n\nfacebook.com, instagram.com\n\n697\n\n0.5%\n\nFree website builders\n\nwixsite.com, wordpress.com\n\n356\n\n0.2%\n\nTop Non-Hotel Domains Found in Hotel Listings\n\nDomain\n\nListings\n\nType\n\nbooking.com\n\n692\n\nOTA redirect\n\nfacebook.com\n\n528\n\nSocial media\n\ngoogle.com\n\n289\n\nGoogle redirect / placeholder\n\ntraveleto.com\n\n224\n\nBooking redirect\n\ntripcombined.com\n\n212\n\nMeta-search redirect\n\nwixsite.com\n\n184\n\nFree website builder\n\ninstagram.com\n\n169\n\nSocial media\n\nwordpress.com\n\n82\n\nFree website builder\n\n692 \"hotels\" have booking.com as their website.\n\nThese are hotels with no direct website — their Google Business Profile links to their Booking.com page. Others link to Facebook (528), Instagram (169), or free website builders like Wix (184). None of these are indicators of a professional hotel operation.\n\n## Cleaning the Data: Filter Effectiveness\n\nHow much noise can progressive filtering remove? We applied filters cumulatively and measured the impact. The review threshold is by far the most effective single filter.\n\nCumulative Filter Pipeline\n\nFilter Applied\n\nRemaining\n\nShare\n\nRaw (no filter)\n\n148,923\n\n100%\n\nHas website\n\n147,367\n\n99%\n\n\\+ reviews > 0\n\n136,375\n\n91.6%\n\n\\+ reviews > 10\n\n127,404\n\n85.6%\n\n\\+ has street address\n\n126,712\n\n85.1%\n\n\\+ name > 7 chars\n\n125,447\n\n84.2%\n\n\\+ exclude non-hotel domains\n\n124,537\n\n83.6%\n\n### Review Threshold vs Belvilla Survival\n\nThe review threshold is surgical. It eliminates fake listings while preserving real hotels. Here is how Belvilla listings survive at each threshold:\n\nReview Threshold Effectiveness Against Belvilla\n\nThreshold\n\nTotal Remaining\n\nRemaining %\n\nBelvilla Surviving\n\n\\> 0\n\n137,235\n\n92.2%\n\n181\n\n\\> 1\n\n135,307\n\n90.9%\n\n36\n\n\\> 5\n\n131,201\n\n88.1%\n\n8\n\n\\> 10\n\n127,651\n\n85.7%\n\n3\n\n\\> 25\n\n118,989\n\n79.9%\n\n1\n\n\\> 50\n\n108,432\n\n72.8%\n\n0\n\n\\> 100\n\n92,801\n\n62.3%\n\n0\n\nThe recommended filter: > 10 reviews.\n\nAt > 10 reviews, you keep 85.7% of listings and eliminate 99.9% of Belvilla spam (from 5,047 to 3). Going higher (> 50, > 100) starts cutting legitimate small hotels. The > 10 threshold offers the best precision-recall tradeoff for hotel data cleaning.\n\n## Country-by-Country Comparison\n\nData quality varies dramatically by country. Belgium and the Netherlands lose over 50% of listings after cleaning — almost entirely due to Belvilla. Italy and Greece are the cleanest.\n\nGoogle Maps Hotel Listings: Raw vs Clean by Country\n\nCountry\n\nRaw\n\nAfter Cleaning\n\nDrop %\n\nNotes\n\nFrance (FR)\n\n28,890\n\n23,310\n\n19.3%\n\n170% of official hotel count\n\nItaly (IT)\n\n37,861\n\n34,194\n\n9.7%\n\nLowest drop rate — cleanest data\n\nGermany (DE)\n\n33,360\n\n26,944\n\n19.2%\n\n3,120 Traum-Ferienwohnungen inflate count\n\nSpain (ES)\n\n23,953\n\n20,906\n\n12.7%\n\nUSA (US)\n\n15,499\n\n14,222\n\n8.2%\n\nMotel / extended stay noise\n\nGreece (GR)\n\n11,340\n\n11,160\n\n1.6%\n\nPre-filtered export — cleanest\n\nAustria (AT)\n\n9,431\n\n7,731\n\n18%\n\n516 Belvilla listings\n\nNetherlands (NL)\n\n7,245\n\n3,616\n\n50.1%\n\n1,379 Belvilla = 19% of total\n\nSwitzerland (CH)\n\n5,385\n\n4,412\n\n18.1%\n\nBelgium (BE)\n\n4,713\n\n2,189\n\n53.6%\n\n1,755 Belvilla = 37% of total\n\n### Worst: Belgium (53.6% drop)\n\n1,755 Belvilla listings make up 37.3% of all Belgian \"hotels.\" After cleaning, Belgium goes from 4,713 to 2,189 listings. More than half the dataset is fake.\n\n### Netherlands (50.1% drop)\n\n1,379 Belvilla listings (19% of total). Drops from 7,245 to 3,616. Without Belvilla, the Netherlands would have a clean dataset.\n\n### France (19.3% drop)\n\nFrance has 170% of its official hotel count on Google Maps (28,890 vs ~17,000 real hotels). Mix of Belvilla, accor.com domain listings, and vacation rentals.\n\n### Best: Greece (1.6% drop)\n\nGreece has the cleanest data in our sample — only 1.6% of listings fail QA. This is partly due to pre-filtered export from our scraping setup.\n\nWithout Belvilla, data quality improves dramatically.\n\nRemove one company's listings and Belgium's drop rate goes from 53.6% to ~16%. Netherlands goes from 50.1% to ~31%. The pollution is concentrated, not distributed — which means it's fixable. Google could solve half the problem by validating one brand.\n\n## Chain Hotels vs Independent Hotels\n\nChain hotels are inherently cleaner data. They actively manage their Google Business Profiles, have dedicated digital teams, and rarely have zero reviews. Independent hotels are where the noise concentrates.\n\n### Chain Hotels\n\n10,897\n\n7.3% of dataset\n\n1,210\n\navg reviews\n\n1%\n\nzero reviews\n\n### Independent Hotels\n\n138,026\n\n92.7% of dataset\n\n395\n\navg reviews\n\n8.4%\n\nzero reviews\n\nTop Hotel Chains by Listing Count\n\nChain\n\nParent Company\n\nListings\n\nMarriott\n\nMarriott International\n\n1,858\n\nHilton\n\nHilton Worldwide\n\n1,496\n\nWyndham\n\nWyndham Hotels & Resorts\n\n1,074\n\nIHG\n\nInterContinental Hotels Group\n\n1,006\n\nBest Western\n\nBest Western International\n\n824\n\nAccor\n\nAccor SA\n\n750\n\nChoice\n\nChoice Hotels International\n\n734\n\nB&B Hotels\n\nGoldman Sachs / B&B Hotels\n\n466\n\nMotel 6\n\nG6 Hospitality (Blackstone)\n\n286\n\nHyatt\n\nHyatt Hotels Corporation\n\n285\n\nChain hotels have 3x more reviews and are 8x less likely to have zero.\n\n1,210 avg reviews vs 395. 1.0% zero reviews vs 8.4%. Chain data is inherently more reliable because chains actively manage their profiles. For AI systems, weighting chain data higher is a reasonable quality heuristic — but it disadvantages legitimate independent hotels who simply don't manage their Google profiles.\n\n### Fun fact: if Belvilla were a hotel chain...\n\nWith 5,047 listings on Google Maps, Belvilla would be **the largest hotel chain in Europe** — by far. Accor has 750 in our dataset. Marriott has 1,858. OYO's combined 8,167 vacation rental listings dwarf every actual hotel group. A vacation rental marketplace that nobody in hospitality takes seriously has more Google Maps \"hotel\" listings than Marriott, Hilton, IHG, and Accor combined.\n\n## Why This Matters for AI\n\nGoogle Maps is not just a consumer tool — it is the **foundational data layer** for AI hotel recommendations. ChatGPT uses Google Places for 88.8% of its hotel entity cards. Gemini uses Google Maps directly. Perplexity queries Google. When the source data is dirty, everything downstream inherits the noise.\n\n### AI recommends non-existent properties\n\nA zero-review Belvilla \"hotel\" in rural Belgium can appear in ChatGPT results. The AI has no way to distinguish it from a real hotel based on Google Maps data alone.\n\n### AI inflates hotel counts per city\n\nBrussels has ~200 real hotels. Google Maps says 4,710. An AI asked \"how many hotels in Brussels?\" will give a wildly wrong answer.\n\n### AI uses reviews from fake listings\n\nWhen computing average ratings or competitive analysis, fake listings with 0-1 reviews skew the statistics that AI uses to rank hotels.\n\n### Anyone can influence AI output\n\nCreating a Google Business Profile as a \"Hotel\" is free and unverified. This is an open vector for manipulating AI hotel recommendations.\n\nThe fix is simple but nobody's doing it.\n\nA > 10 review filter eliminates 99.9% of spam while keeping 85.7% of real hotels. OpenAI, Google, and other AI providers could dramatically improve hotel recommendation quality with a single threshold. The fact that they don't suggests they prioritize coverage over accuracy — or simply haven't audited the data.\n\n## Frequently Asked Questions\n\n## Methodology\n\n### Data Collection\n\n178,647 Google Maps listings categorized as hotels, scraped via Apify Google Maps scraper across 10 European countries + USA. After deduplication on Google Place ID: 148,923 unique listings.\n\n### Domain Extraction\n\nWebsite URLs parsed using `tldextract` for accurate root domain identification. This enables matching vacation rental platforms (belvilla.nl, belvilla.de, etc.) and detecting non-hotel websites (booking.com, facebook.com).\n\n### Chain Detection\n\nChain hotels identified via domain matching (marriott.com, hilton.com, etc.) and name pattern matching. 10,897 chain hotels detected (7.3% of dataset). Unmatched listings classified as independent.\n\n### Quality Filters\n\nProgressive filters applied: website presence, review count > 10, street address present, name > 7 characters, exclude known non-hotel domains. Final clean dataset: 124,537 (83.6%).\n\n### Limitations\n\nScraping coverage varies by country (Greece was pre-filtered). Official hotel counts are estimates. Chain detection misses some brands. The \"clean\" dataset still contains some noise — our filters optimize for recall (keeping real hotels) over precision (excluding all fakes).\n\n### Related Research\n\nSee our [ChatGPT Map Providers Study](/research/tripadvisor-chatgpt-hotels-study-2026) for how this data flows into AI recommendations, and our [Anatomy of ChatGPT Hotel Search](/research/anatomy-chatgpt-hotel-search-2026) for the full technical architecture.\n\n## Want the Full Picture?\n\nDirty data is just one piece of the AI hotel visibility puzzle. Read our flagship study covering how AI is reshaping hotel discovery.\n\n[Read AI Hotel Landscape 2026](/research/ai-hotel-landscape-2026)","author":{"@type":"Person","name":"Nicolas Sitter","url":"https://nicolassitter.com/about","sameAs":["https://www.linkedin.com/in/nicolassitternolleau/","https://github.com/Nicositter88","https://hotelrank.ai"]},"publisher":{"@type":"Person","name":"Nicolas Sitter","url":"https://nicolassitter.com"},"image":"https://nicolassitter.com/api/og/google-maps-hotel-data-quality-2026","mainEntityOfPage":{"@type":"WebPage","@id":"https://nicolassitter.com/research/google-maps-hotel-data-quality-2026"},"tags":["Google Maps","Data Quality","Hotel Listings","ChatGPT"],"sameAs":["https://hotelrank.ai/research/google-maps-hotel-data-quality-2026"],"alternateFormat":{"html":"https://nicolassitter.com/research/google-maps-hotel-data-quality-2026","json":"https://nicolassitter.com/api/post/google-maps-hotel-data-quality-2026","rss":"https://nicolassitter.com/rss.xml"},"datasets":[{"name":"summary","contentUrl":"https://nicolassitter.com/data/google-maps-hotel-data-quality-2026/summary.csv","encodingFormat":"text/csv"}]}