Article
November 7, 2025
Senthor Team

How to Analyze LLM Traffic in Google Analytics 4

Learn how to identify and track visits from ChatGPT, Gemini, Copilot and other LLMs in Google Analytics 4 using custom regex filters.

With the rise of language models like ChatGPT, Gemini, Copilot and Perplexity, a new traffic source is appearing in your web statistics. These artificial intelligences now recommend websites to their users, generating visits that you can – and should – track in Google Analytics 4.

But be careful: this traffic has nothing to do with scraping. These are actual human visitors clicking on links provided by LLMs, not bots crawling your pages. Let's see how to identify them effectively in GA4.

What is LLM Traffic in Google Analytics?

When a user asks a question to ChatGPT, Gemini or Copilot, the AI can recommend your site as a source of information. If the user clicks on that link, they arrive on your site with a specific referrer (likechatgpt.com or gemini.google.com).

Key points to understand:

  • These are real human visitors, not bots or crawlers
  • They come via a click on an AI recommendation
  • The referrer is identifiable in your analytics tools
  • This traffic generates page views, time on site, etc.

The Fundamental Difference with AI Scraping

It's crucial to distinguish two very different phenomena:

1. LLM Traffic (visible in GA4)

  • Humans visit your site after an AI recommendation
  • Visible in Google Analytics because these are real user sessions
  • Generates conversions, engagement, potential ad revenue
  • Similar to traffic from social media or a recommendation site

2. AI Scraping (invisible in GA4)

  • Bots crawl your pages to train AI models
  • Generally does NOT appear in Google Analytics (no JavaScript execution)
  • Generates no conversion, no revenue for you
  • Can overload your servers and steal your content

To detect and control AI scraping, you need a dedicated solution like Senthor, which analyzes server logs, identifies even masked bots, and allows you to block or monetize their access. Google Analytics can't do that.

How to Identify LLM Traffic in GA4

To effectively track visits from LLMs, you need to create a custom filter in Google Analytics 4 using a regular expression (regex) that captures the main AI referrers.

Step 1: Create a Custom Segment

  1. Log in to your Google Analytics 4 account
  2. Go to Explore
  3. Create a new exploration or open an existing one
  4. In the Variables section, click the + next to "Segments"
  5. Select "Create a custom segment"

Step 2: Configure the Regex Filter

In the segment conditions, configure:

  • Dimension: Session source/medium (or Session source)
  • Match type: "matches regex"
  • Value: use the regex below

The Regex to Detect LLMs

Here's the complete regular expression that captures the main LLMs:

^.*\.openai.*|.*copilot.*|.*chatgpt.*|.*gemini.*|.*gpt.*|.*neeva.*|.*writesonic.*|.*nimble.*|.*perplexity.*|.*google.*bard.*|.*bard.*google.*|.*bard.*|.*edgeservices.*|.*bnngpt.*|.*gemini.*google.*$

What This Regex Captures

  • openai and chatgpt: ChatGPT and OpenAI services
  • copilot: Microsoft Copilot
  • gemini and bard: Google Gemini (formerly Bard)
  • perplexity: Perplexity AI
  • writesonic, neeva, nimble: other AI assistants
  • edgeservices: Edge services with integrated AI
  • bnngpt: GPT variants

Step 3: Analyze the Data

Once your segment is created, you can:

  • See traffic volume generated by LLMs
  • Compare behavior: bounce rate, session duration, page views
  • Identify landing pages most visited from LLMs
  • Measure conversions generated by this channel
  • Track evolution over time

Create a Custom Channel for LLMs

For even easier analysis, you can create a dedicated traffic channel:

  1. In GA4, go to Admin
  2. Under "Data display", click on "Channel groups"
  3. Edit your main channel group
  4. Add a new channel rule named "LLM / AI Traffic"
  5. Configure the condition: Session source matches regex (use the regex above)
  6. Place this rule BEFORE "Direct" to give it priority
  7. Save and publish

Now, in all your acquisition reports, you'll see a distinct channel for LLM traffic!

Limitations of This Approach

While this method is effective for tracking human visits via LLMs, it has several limitations:

1. Not All Clicks Are Tracked

  • Some LLMs may not transmit a referrer
  • Users with tracking blockers won't be counted
  • New AI platforms aren't yet in the regex

2. No Visibility on Scraping

This is the most important point: Google Analytics will NEVER tell you:

  • Which AI bots are scraping your pages
  • What scraping frequency you're experiencing
  • Which pages are being crawled to train models
  • How much bandwidth is consumed by AI crawlers

3. No Control or Monetization

With GA4, you're just observing. You can't:

  • Selectively block certain AI bots
  • Allow access in exchange for payment
  • Protect your content from massive scraping
  • Distinguish legitimate bots from aggressive ones

Why You Need Senthor as a Complement

Google Analytics and Senthor are complementary, not competitors:

FeatureGoogle AnalyticsSenthor
Human traffic via LLMs✅ Yes✅ Yes
AI scraping detection❌ No✅ Yes
Masked bot identification❌ No✅ Yes
Selective blocking❌ No✅ Yes
Content monetization❌ No✅ Yes (coming soon)
SEO protection❌ No✅ Yes
Behavioral analysis✅ Humans only✅ Bots and patterns

Real Use Cases

Scenario 1: Content Publisher

You publish technical tutorials. With GA4, you see that 5% of your traffic comes from ChatGPT and Perplexity. That's encouraging! But meanwhile, bots are scraping your articles to feed other AIs without citing you.

Solution: GA4 to measure human traffic + Senthor to detect and block unauthorized scraping.

Scenario 2: E-commerce Site

Users arrive via Copilot after asking for product recommendations. GA4 shows you these sessions and associated conversions. But you also notice abnormal server load: bots are crawling your product pages.

Solution: GA4 for marketing attribution + Senthor to protect your product catalog.

Scenario 3: News Media

Gemini recommends your news articles. Great for visibility! But GPTBot and other crawlers copy all your premium content to train their models.

Solution: GA4 to measure audience + Senthor to block AI bots' access to premium content.

How to Set Up Both Solutions

1. Configure GA4 (now)

  • Implement the regex presented in this article
  • Create your "LLM Traffic" channel
  • Set up reports to track evolution
  • Analyze this traffic's behavior compared to other sources

2. Install Senthor (if you're concerned about scraping)

  • Evaluate if your site is being scraped (server logs, abnormal load)
  • Install the WordPress plugin or Vercel integration
  • Configure blocking/authorization rules
  • Monitor scraping attempts in real-time
  • Prepare for future content monetization

Conclusion: Two Tools, Two Missions

Measuring LLM traffic in Google Analytics 4 is simple, free and revealing. This new visitor source can become significant, and the regex we provide lets you track it right now.

But don't forget: what you see in GA4 is humans clicking. What you DON'T see are the bots copying. And that's where solutions like Senthor become indispensable.

The generative AI era is transforming the web. Publishers who will survive are those who know how to both welcome human traffic generated by LLMs AND protect their content from massive scraping.

The two aren't opposed. They're complementary.

Did you like this article?Share it with your network

Protect your content today

Monetize your content against AI with Senthor.

Senthor - Monétisez votre contenu face aux IA