SEO – Pietro Mingotti https://pietromingotti.com Technical SEO, Advanced CPC and Digital Analytics Case Studies Tue, 19 Aug 2025 13:45:12 +0000 en-US hourly 1 https://pietromingotti.com/wp-content/uploads/2023/04/cropped-cropped-Pietro-Mingotti-2023-Web-High-Res-1-32x32.jpg SEO – Pietro Mingotti https://pietromingotti.com 32 32 How LLMs extract and quote snippets https://pietromingotti.com/how-llms-extract-and-quote-snippets/ https://pietromingotti.com/how-llms-extract-and-quote-snippets/#respond Tue, 19 Aug 2025 13:29:09 +0000 https://pietromingotti.com/?p=1660 Read more]]>

TL;DR

  • Browsing models fan-out multiple short queries, fetch top results, skim titles + intros, and compose a synthetic answer. Citations are added only when the system is confident about attribution.
  • The most reused fragments: page title, the first ~500–1000 characters, and any definition/answer block directly under a heading. Meta descriptions (your SERP snippet) matter more than you think.
  • Links are probabilistic. Clear structure, named entities, and “answer-first” copy raise your odds; blended sources and marketing fluff lower them.
  • Technical SEO still matters: fast HTML-first rendering, schema, SSR/static output. If retrievers can’t parse you quickly, you’re invisible.

If you’re still treating AI answers like “blue links with extra steps,” you’re going to miss where visibility actually happens. LLMs generate answer, they don’t rank or index anything. Then how do llms extract content and when do they quote it and link it?

In browsing-enabled modes (ChatGPT w/ Bing, Bing Copilot, SGE, Perplexity, Claude), models don’t read your whole page like a human. They assemble answers from tiny, extractable fragments, and only sometimes attach a link.

Below I’ll show the pipeline, what gets lifted, when links appear, and how to format pages so they’re quote-friendly.

aeo ageo ai seo framework guide copia
5290058

Enjoy Ad Free

Access the full Research Paper. For free.

This article is just an extract from the full 100 pages independent research I’ve written for Fuel LAB® Research over 2 years of analysis, studying LLMs models, and data collection.

What happens during RAG

I’ve written a dedicated article extracted from the research paper on how LLMs work under the hood; however, what you need to know here is that when a user asks a complex question, the assistant:

  1. Rewrites the prompt into several short sub-queries (≈3–5 words).
  2. Calls search (usually Bing/Google) and gets back titles, URLs, and snippets.
  3. Scrapes partial content from a handful of top results (intros, definition blocks, sometimes FAQs).
  4. Composes an answer; may attach citations if a source fragment is used verbatim or near-verbatim and attribution confidence is high.

This aligns with how ChatGPT’s browsing mode and similar tools are described publicly: search first, skim, then synthesize; links appear depending on product heuristics.

What LLMs extract (and what they don’t)

Models are on a tight budget: limited fetches, timeouts, and small “content windows” per page. That means they’ll often only lift:

  • Title (and H1 if distinct)
  • The first ~500–1000 characters of body copy
  • A tight definition or answer block immediately below a heading
  • FAQ/HowTo fragments (if clearly marked and near the top)

Practical consequences:

  • Front-load the definition or direct answer.
  • Keep early paragraphs short, declarative, and standalone.
  • Treat meta title + meta description like ad copy: these are sometimes the only words the model sees before deciding whether to fetch.

Think in “info windows”: one heading + 1–2 concise paragraphs + a bulleted list. This maps to how multi-vector retrieval compresses and ranks cohesive segments.

Linking is not the default; it’s an emergent behavior triggered when internal rules agree that the source is relevant, extractable, and safely attributable:

More likely to link when

  • You provided a direct quote/definition the answer depends on
  • The domain is official/high-trust (gov, edu, Wikipedia, major trade sources)
  • The page shows clear authorship, date, and clean structure

Less likely when

  • The model blended multiple sources into one sentence
  • Your layout is messy, interactive, or slow to render
  • The text reads like “general knowledge” rather than a specific, attributable fact block

Observed platform patterns (abridged):

  • ChatGPT (Browsing): sometimes cites 1–3 sources; paraphrases heavily.
  • Bing Copilot: more visible links; favors clean lists/definitions.
  • SGE: mixes sources; often drops links in the primary summary.
  • Perplexity: aggressive inline citations; excellent for long-form attribution.
  • Claude: cites when docs are provided or web context is enabled.

Why structure beats style (every time)

Answer engines reward extractability, not flourish. To raise your quote probability:

  1. Answer first. Put the definition/conclusion in the first 2–3 sentences under each H2.
  2. Keep blocks self-contained. Each section should make sense if lifted in isolation.
  3. Prefer lists and tables. Step-by-steps and comparisons are regularly mirrored in AI output.
  4. Use schema. FAQPage/HowTo/Article/Organization raise machine legibility and attribution confidence.
  5. Brand early. Name, entity, and author metadata near the top helps the model name-drop correctly when it does cite.

The tech behind what LLMs extract and quote

You can’t be quoted if you can’t be fetched:

  • Robots & llms.txt. Allow GPTBot/ClaudeBot/Gemini/Perplexity unless you intend to be excluded from future retrieval/training.
  • HTML-first delivery. Avoid JS-gated copy, heavy modals, and client-side redirects.
  • SSR / static export. Guarantee retrievers get real text on first paint.
  • Speed + simplicity. Timeouts and fragile hydration mean skipped content.

A quick AI extractability and citation potential checklist

  • → Every H2 opens with a one-sentence definition or answer
  • → First 500–1000 chars read like a standalone snippet
  • FAQ/HowTo blocks exist and are marked up
  • Meta title/description state the answer, not just tease it
  • Tables for comparisons; lists for steps/principles
  • Org/Author/Article schema + clear dates/ownership
  • SSR/static build; no content behind modals/cookie walls
  • Robots.txt/llms.txt allow AI crawlers you want to influence

Conclusions

You don’t “rank” in an answer engine; you get selected in tiny pieces. Build pages as a series of clean, attributable information windows, and you’ll see your words show up where users actually read: inside the answer itself.

How do LLMs decide which snippet to use?

They fan-out the user prompt into several short sub-queries, fetch top results, skim titles/intros/FAQs, then synthesize an answer using the most extractable fragments (definition-first, lists, short paragraphs). This is generation, not ranking. For a deeper primer on generation vs. retrieval, see your technical overview. → How LLMs Work – Deep Technical Overview.

Do LLMs always include a link when they quote me?

No. Links are not guaranteed. Even when your text influences the answer, the model may paraphrase without attribution, especially on zero-click surfaces. For context on why “ranking” expectations don’t apply, see “Why You Can’t Rank on ChatGPT”.

What parts of a page get extracted most often?

Page title, meta snippet, and the first ~500–1,000 characters, plus any clearly marked FAQ/definition blocks. Put the answer first. For the macro shift to “answer engines,” see “How LLMs are Disrupting Search Marketing.”

Does traditional SEO still matter for citations?

Yes, because retrieval-enabled LLMs pull from search indexes. If you don’t surface in Bing/Google for the fan-out queries, you’re effectively invisible at retrieval time. → See:How LLMs are Disrupting Search Marketing.”

What page structures increase AI citation likelihood?

Definition-first paragraphs (“X is…”), bullet lists, short steps, Q&A sections, and clean semantic HTML. This aligns with how transformers attend to local structure and how retrieval pipelines skim. → See:Understanding Transformer Architecture – A Guide for Marketers.”

Do backlinks make content more quotable on AI?

Indirectly at best. They may help you rank in SERPs (thus be seen by the retriever), but the selection is driven by clarity, extractability, and answer fit, not PageRank. → See:Why You Can’t Rank on ChatGPT

]]>
https://pietromingotti.com/how-llms-extract-and-quote-snippets/feed/ 0
SEO for AI. Optimizing Your Website Content for Generative AI (ChatGPT & Co.) https://pietromingotti.com/seo-for-ai-optimizing-content-for-chatgpt/ https://pietromingotti.com/seo-for-ai-optimizing-content-for-chatgpt/#respond Sat, 21 Jun 2025 10:16:11 +0000 https://pietromingotti.com/?p=1428 Read more]]> In this research, I’ll try to address the SEO for AI topic by explaining how AI models find and select web content, and how you can optimize your site to become the source that AI references.

Generative AI models and LLMs like ChatGPT are becoming a new layer in content discovery, and have partially mangled some of the SEO market (as an example, publisher traffic): they answer user questions directly, thus extinguishing often the search intent and resulting in a zero-click-search, that is having a significant impact on Organic Search effots and goals for companies worldwide. I should know; all of our clients at Fuel LAB® have been asking for this… and we’ve been research for years.

These AI-driven results (be them from Google Gemini, ChatGPT, Claude, Perplexity…) are often citing and linking to sources. Many businesses are asking “How can we get our site cited or recommended by AI models?”.

While clear-cut rules are still evolving, and we can’t give a science-based framework for something that is, indeed, generative, early evidence suggests that once again, SEO is not dead: technical optimization and content quality remain key.

How AI Models Find and Cite Web Content

Before optimizing, it helps to know how ChatGPT and similar AI systems fetch information. Modern generative models typically don’t have your website “memorized” unless it was in their training data – instead, since some time, they use a real-time search and retrieval process.

For example, ChatGPT’s browsing feature relies on web crawlers and search results fetched from Bing! (while Google Gemini, conversely, uses Google Search):

  • Search integration: ChatGPT (with browsing enabled) formulates search queries and retrieves top results via Bing’s search index. In essence, ChatGPT conducts a search behind the scenes – mostly long-tail queries – and then reads the content of the pages it finds. If your site isn’t showing up in those search results, the AI likely won’t see your content. I would argue, if your site isn’t ranking in the top first 3 results for those queries, the AI won’t see your content. We know this for certain now thanks to the OnCrawl efforts delivered in this PDF and also Jérôme Salomon brilliant and generous public divulgation on LinkedIn early in June ’25.
  • OpenAI’s web crawlers: OpenAI uses three primary crawlers (user agents) to access web content:
    1) ChatGPT-User: a real-time crawler that fetches a page when a user’s prompt triggers it (i.e. ChatGPT “consults” your page to answer a question).
    2) OAI-SearchBot: an indexing bot that asynchronously crawls the web to build an index for ChatGPT’s search functionality.
    3) GPTBot: a crawler that collects content for training AI models (broad content ingestion).

    Insight: The ChatGPT-User bot is the most exciting to see in your analytics – it means an end-user’s prompt caused ChatGPT to visit your page as a source. OAI-SearchBot’s activity indicates your site is being indexed in OpenAI’s “knowledge base” for answering questions, and GPTBot simply means your content may be used in model training (you can actually choose to allow or block this without affecting real-time answers, as discussed later).
  • Citation and answer construction: Once the AI has gathered relevant pages, the part of the process you can try and influence, is over; the model will now take over and will compose an answer. The model selects facts or text from those pages and cites the sources in the response. Early research indicates that content relevance to the query is the top “ranking factor” for which sources get cited.

    In practice, ChatGPT will cite the pages that best answer the question or provide the clearest info, rather than basing it on traditional link-based page rank. This means even a lesser-known site can be cited if it precisely addresses the query, though being indexed and visible in search results is a prerequisite.
  • No JavaScript rendering: A crucial technical note – OpenAI’s bots do not execute JavaScript when crawling. They fetch the raw HTML. Again, good ol’ SEO on page best practices are still relevant. So any content that relies on client-side scripts (SPA content, lazy-loaded text, etc.) may be invisible to ChatGPT.

    In other words, if it’s not in the static HTML, ChatGPT won’t see it. Ensuring your important text is server-rendered (or at least available in the initial HTML) is essential for AI and SEO bots alike.
Bottom line: To be cited, your content must first be found and understood by the AI. That means it should rank in the search results the AI consults, be accessible to crawlers, and be easy for the model to parse.

Technical SEO is fundamental for AI Visibility

From what we’ve observed in Fuel LAB®, technical SEO and site reputation play a foundational role in AI content selection. Many principles of traditional SEO (Search Engine Optimization) carry over into what some are calling “AEO” – Answer Engine Optimization – or LLM SEO.

We started calling this OSE (Organic Search Engineering) a while ago; it was already clear that while Technical and Semantic SEO are still the foundation, many other techniques and tools are required for a successful strategy. 

Here are the must-do technical steps to help AI models find and favor your site:

  • Get indexed (special attention on Bing): ChatGPT’s search capability leans heavily on Bing’s index . Thus, ensuring your pages are indexed on Bing (and ranking well for relevant queries) is step one.

    Use Bing Webmaster Tools to submit your sitemap and monitor indexation. Leverage the IndexNow protocol (supported by Bing) if your CMS offers it, to push new content to search instantly.

    Fact: Without Bing indexation, your content might as well be invisible to ChatGPT.
  • Allow OpenAI’s crawlers: Make sure you’re not blocking OpenAI’s user agents (ChatGPT-User, OAI-SearchBot, GPTBot) in your robots.txt or firewall. If you’re part of a large enterprise, this is relevant to you. Many times “security” professionals will be blocking without your knowledge everything that they don’t understand.

    In fact, including your XML sitemap in robots.txt is recommended, because ChatGPT’s indexing bot will crawl sitemaps if it finds them listed . This can accelerate discovery of all your important pages.

    Note: If for some reason you want to opt out of training but still allow being cited in answers, you can disallow GPTBot while allowing OAI-SearchBot, since ChatGPT can still use your content via the search index even if it’s not in training data. But that’s kind of pointless then to hope to be cited. You gotta kill a cow to make a burger.
  • Ensure crawl accessibility: Treat ChatGPT’s bots like traditional search engine crawlers; they need to fetch content easily. That means fixing broken links (404s), avoiding fragile client-side rendering, and making sure your site doesn’t require special logins or cookies for core content. If certain pages are frequently crawled by ChatGPT-User or OAI-SearchBot (visible in your server logs), but returning errors, fix those issues promptly.

    Tip: Monitor your log files for those user agents to see which pages are getting attention; these are likely candidates for appearing in AI answers.
  • Page speed and formatting: While we don’t have direct evidence that page load speed affects ChatGPT’s choices (as it would for human UX), it’s wise to ensure your pages are fast and lightweight for crawlers. More importantly, ensure the textual content is easily extractable – for example, avoid burying key info in images or complex HTML that might confuse parsers. A clean, semantic HTML structure (with proper headings, paragraphs, lists) helps AI models quickly identify the main points of your content.
  • No heavy client-side antics: As already said, don’t hide content behind JavaScript . If you use modern web frameworks, implement server-side rendering or hydrations that output meaningful HTML.

    For instance, if you have an FAQ accordion written in React, make sure the FAQ text is present in the HTML (even if initially hidden via CSS) so crawlers can read it. Treat OpenAI bots similar to Googlebot in this regard – except even more restricted, since they never run scripts.
  • Schema Markup is fundamental, but not the way you think: Schema.org markup helps traditional search engines (like Google and Bing) understand the structure and context of your content. Since many AI models, including ChatGPT with browsing, rely on search engine indexes to find content, schema will indirectly help your content be found by improving how it ranks or gets featured in search results.
    • Use FAQ schema (FAQPage) when possible — this helps in both search engine results and makes your content more likely to match question-answer prompts that LLMs handle.
    • Use Article, Product, Service, and Organization schema to define what your pages are about and tie them to known semantic entities.
    • Mark up authorship and dates to reinforce content freshness and attribution.
    • Don’t rely on schema as a replacement for well-structured visible content: LLMs prioritize what’s in plain HTML.
    • Don’t assume that adding schema will directly make ChatGPT cite you; it’s an indirect factor.

Ensuring the above will get your site into the AI’s “consideration set”. Think of it as indexability and crawlability for AI. Now, let’s talk about how to stand out among the considered sources.

Good SEO for AI(Answer Engine Optimization) AEO is just a new name.

Once your site can be seen by AI models, the next challenge is to be selected and cited. That’s where the C-Suite and Technical Marketers will start fighting. The thing is, doesn’t matter how much they don’t like the answer, a Large Language Model doesn’t rank anything. As explained, and proved, it uses search engines, relies on their ranking, and then elaborates the answer.

So here, good old Organic Search Engineering is still key. Content quality, relevance, and structure become the battleground. Generative AI doesn’t “rank” pages by classic SEO metrics like backlinks; it’s trying to find the best answer for the user. So how can you craft content that an AI will judge as the best answer?

Here are strategies:

Cover the topic deeply and semantically: this means your content should thoroughly cover the user’s query, answer the main question and related follow-up questions, define important terms, and provide context.

LLMs are drawn to content that provides comprehensive, in-depth explanations because it gives them more to work with. For example, if the question is “How to optimize a site for ChatGPT citations?”, a shallow 200-word answer on your blog likely won’t be as useful to the AI as a 2000-word guide covering multiple angles (technical steps, content tips, examples, pitfalls).

But will this rank and be found by the model? That depends, as you know. Are you working for a huge brand with a ton of Domain Authority, or are you working with content that struggles to rank anyway? You know the answer to this. Often, even the best practices won’t give expected results, if several of the hundreds of rank factors are not matched.

Expand for Technical Explanation

  • Mechanistic interpretability of relevance: A recent study shows that LLMs use a multi-stage process—first extracting query/document info, then assessing relevance in layers, and finally using attention heads to rank documents for citation or response generation. This supports the idea that detailed and semantically rich content is more likely to be identified and used by LLMsen.wikipedia.org+1growthmarshal.io+1arxiv.org.
  • Structured relevance assessment: Another publication comparing LLM relevance approaches found that models align closely with human judgments (Kendall correlation), indicating that they can accurately evaluate content when it’s structured and covers the query comprehensively.

Provide clear, immediate answers: Large Language Models tend to favor the first or clearest explanation of a concept in a page . So don’t bury them in complex, far away from the ATF (above the fold).

We have observed good results using an inverted pyramid approach: answer the core question in the opening paragraph or two, as clearly as possible, then elaborate further in the subsequent sections. This mirrors Google’s featured snippet optimization, but here it’s about giving the AI a quick grasp of your page’s relevance.

If your page has a succinct definition or answer right up front, ChatGPT might choose to latch onto that and cite you as a source of a clear definition.

Expand for Technical Explanation

Structure your content for AI comprehension: Proper structure isn’t just for human readers – it also helps AI models understand what your content is and when to surface it . You need to realize that (again, just like with SEO) the time spent elaborating your content, finding it, crawling it, and so on, it’s all a cost for the technology that is operating. They will always count that as a factor, like with crawl budget. Use a good semantic html structure with descriptive headings (H2s, H3s) that outline the questions or subtopics you address. Utilize bullet points or numbered steps for procedural or list-based information.
A well-structured page allows the AI to navigate and extract the exact piece of information it needs. For instance, if you have a section titled “Technical SEO Tips for AI” and the user’s question is about AI crawling, the model can jump to that section. In contrast, a wall of unorganized text is harder for the AI to parse and might be overlooked in favor of a clearly organized competitor’s page.

Retrieval-augmented generation (RAG): RAG pipelines emphasize feeding relevant document passages into the LLM before answer generation; this means clarity in early answers improves the quality of AI output.
Citation accuracy challenges: Even with RAG, LLMs sometimes hallucinate or wrongly cite sources. Several studies show up to 50% of claims aren’t fully supported by the cited sources. Clear, upfront content reduces misalignment and promotes citation accuracy.

Expand for Technical Explanation

LLMs evaluate relevance in structured layers (query-document representation, instruction processing, attention heads). That means having clearly marked sections (e.g., H2/H3 for sub-& follow-up topics) aligns well with how LLMs “read” and prioritize text.

Demonstrate expertise and authority: While LLMs don’t directly measure E-A-T (Expertise, Authoritativeness, Trustworthiness) like Google might, they do analyze the content’s language and detail. Content that is persuasive and authoritative tends to “win” in AI answers.

This means writing with confidence, citing facts or data (yes, the AI can see if you reference statistics or reputable sources in your text), and providing insightful, original perspectives – not just generic fluff. Original research, unique case studies, or specific expert quotes on your pages can make them stand out to an AI looking for trustworthy information to share.

Expand for Technical Explanation

Citation practices in science: Research on how LLMs recommend academic citations reveals they display a bias toward highly cited, authoritative sources—indicating that perceived authority influences what gets referenced.

Enhancing transparency: A position paper advocates integrating citations into LLM output to bolster trust and accountability, suggesting that explicitly authoritative, well-sourced content could be more likely to be used

Original and human-friendly content: “Built for both human searchers and the models guiding them,” is a mantra to follow.

In practice – write for humans, but keep machines in mind. Once again, good old SEO rules. An engaging, well-explained article will naturally contain the elements an AI values (clarity, depth). Avoid overt “keyword stuffing” or awkward AI-targeted language. All that stuff is dead. Both ranking algorithms and LLMs use neural processing, in other words, these technologies are build to think as a human would.

Instead, focus on answering likely questions thoroughly. Remember that if a human finds your content valuable, there’s a good chance an AI model will find and use it as well, since human value often correlates with relevance and clarity.

Expand for Technical Explanation

Even models using RAG can hallucinate or misinterpret nuanced phrasing, reinforcing that plain-language clarity and human-centric writing reduce errors, editors improve model citations, and make the output more factual.

I could keep writing a dozen of other reccomendations, but that would all qualify as SEO optimizations. Although this is a research article and not just a blog post, for the sake of readability, let’s stick with LLM directly impacting practices. But first, one last very interesting topic.

Long Tail Keywords, or short nGrams?

I’ve been studying for a long time how ChatGPT and similar models form search queries especially when they use web access to find sources.

While some claim that LLMs favor long-tail queries, evidence suggests that (as always in science) the truth is more nuanced: both long-tail and short-phrase (n‑gram) queries play a role, depending on how the model processes the user prompt.

AI Search Queries: Long but Compressed

When a user asks ChatGPT a complex, multi-part question — like:

“How do I optimize my WordPress site to be recommended by ChatGPT when someone asks about privacy-friendly CRMs?”

The model doesn’t pass this full prompt to a search engine. Instead, it analyzes the intent, identifies core topics, and typically generates multiple shorter subqueries behind the scenes. For instance:

  • optimize site for ChatGPT
  • how ChatGPT recommend websites
  • privacy-friendly CRM wordpress

These are often 4–5 words long, which technically qualify as long-tail keywords, but are still condensed compared to the original prompt. This process is supported by emerging data:

So What Should You Optimize For?

Both short and long queries matter, but in different ways:

Query TypeRole in LLM ReasoningHow to Optimize
Short n‑gram queries (2–3 words)Represent atomic subtopics. Often used for direct retrieval or indexing.Ensure that H2s, image alts, meta titles, and URLs include clean, high-volume search terms (e.g. chatgpt seo, llm optimization)
Mid/Long-tail phrases (4–6+ words)Match more specific intents. Often reflected in how the model frames questions internally.Include conversational headers (e.g. “How to get my site cited by AI?”) and phrase-level variations in your paragraph content, FAQs, and intro text.

LLMs break down complex prompts into multiple focused queries, often in the 3–5 word range; technically long-tail, but still distilled. That means your content should:

  • Include short, high-signal phrases for relevance scoring,
  • Provide deep, semantically complete answers for topic coverage,
  • Reflect both intent-specific and broad anchor terms in structure and phrasing.

In other words, don’t pick between long-tail and short-tail. Stop thinking in terms of keywords; it’s like trying to build a car while focusing obsessively on screws and bolts. You need to understand and mirror the model’s dual logic: compress intent, then expand coverage.

Useful Tools and Standards for AI SEO

Because AI-driven search is so new, we’re also (finally) seeing new tools and standards designed to help site owners adapt. I myself wished these were available years ago when we started researching.

Here are a few I like and personally use, worth knowing:

  • LLM analytics & tracking: Since traditional SEO tools don’t tell you when you’ve been cited by an AI, specialized solutions are popping up. For example, Peec AI and similar platforms let you track prompts and see which sources are appearing in AI-generated answers.

    Ahrefs has even added an “AI Overview” share-of-voice in their suite to see if your brand is mentioned in Google’s AI answers.

    If you’re serious about AI optimization, consider using these to measure your progress; they can reveal, for instance, that a niche forum or competitor is being cited often for topics where you have content gaps.
  • Log analysis for AI bots: As mentioned, checking server logs is a more technical but effective way to gauge your visibility.

    If you use a log analysis tool (like Oncrawl’s log analyzer), you can filter for user agents like ChatGPT-User or OAI-SearchBot to see how often they hit your pages, which pages, and when.

    A spike in ChatGPT-User hits might correlate with trending questions in your space that your site is helping answer. You can treat those pages as high priority for maintenance and improvement.

Expand for in-depth Workflow

What AI Bot Log Analysis Reveals

  1. Bot Visits (“Impressions”)
    Logs help identify which pages AI bots crawl, a form of impression that’s invisible to standard analytics.
  2. Referral Monitoring (“Clicks”)
    You can see if users clicked through from ChatGPT‑cited links; useful since GA4 often fails to track these properly.
  3. Identify Crawl Patterns & Friction
    Analyze crawl frequency, bot hit distribution, error codes (4xx/5xx), and redirect chains. AI bots don’t behave like Googlebot; they may skip JavaScript-heavy or error-filled pages.
  • LLMs.txt (proposed standard, not adopted as June ’25): You might have heard about llms.txt, a proposed text file standard similar to robots.txt, where site owners could list important content for AI to crawl. The idea is to provide a roadmap of your site’s best “AI-friendly” content (like documentation, product info, FAQs) in a simple format.

    However – no major AI services currently use llms.txt. OpenAI, Google, Anthropic, etc. do not yet support it, so adding an llms.txt file today likely has no effect on your visibility. It’s a speculative idea at this stage, much like having a meta tag that no search engine recognizes.

    Google’s John Mueller has dismissed llms.txt as ineffective and unused by AI bots so far. This is actually and indication that llms.txt is going to be useful, since John seems to always speak out to debunk myths that actually are not such.

    That said, some companies (e.g. Anthropic) have published an llms.txt on their site as a forward-looking measure, and free generators exist if you want to create one. Our advice: don’t rely on llms.txt for now; focus on proven fundamentals, but keep an eye on this space. If adoption grows, it could become another tool in the AI SEO toolbox.
  • GTP Extensions for Keyword Research: gotta love these, actually gotta love them, the developers who make these little bookmarklettes and extensions available to everyone to enjoy.
  • Companies presenting themselves as A.I. Analytics: I will update this research when I have actually used and tested all of these, but some interesting proposals that I can’t skip seem to be:

Conclusion and Key Takeaways

Optimizing your website for generative AI models is an emerging discipline more than emerging science, but the early certain lessons are clear: a technically sound site + high-quality, well-structured content = the best chance of being cited by AI.

In practice, that means making your content easily discoverable (indexed on Bing, accessible to OpenAI’s crawlers) and making it genuinely useful (relevant, comprehensive, and clearly presented).

Let’s summarize the key points:

  • Indexing & Accessibility: If you’re not indexed in search (especially Bing, in the case of ChatGPT) or if your site blocks AI crawlers, you won’t even enter the race. Make your content visible and crawler-friendly (no heavy JS, no login walls).
  • Relevance is King: In AI answer selection, relevance and depth trumps fame. A lesser-known site that thoroughly answers a niche query can be cited over a top brand with thin content (and on this, long tail keyword optimization is still a winning strategy). Focus on answering questions completely and clearly; the AI will recognize that.
  • Content Structure Matters: Organize your content with headings, lists, and logical flow. A well-structured page is easier for an AI to digest and use. Think about the questions a user (or AI) might have and make those answers stand out in your text.
  • Keep it Human: Write in an authoritative but approachable tone, as you would for a savvy reader. Engaging, original content not only appeals to human readers (who ultimately are your customers), but it also tends to contain the nuance and detail that AI systems find valuable.
  • Monitor and Adapt: Since this field is new, continuously monitor how and when your content is appearing in AI responses. Use log analysis or AI SEO tools to get feedback. If you discover, for example, that ChatGPT is citing a competitor’s article on a topic you haven’t covered, that’s a golden opportunity to create new content and fill the gap.

    Likewise, if you see ChatGPT citing you but paraphrasing incorrectly, you might need to clarify that section in your content.
Finally, a mindset note: 

We are in the early days of AI-driven search. Best practices will evolve. Treat your optimization efforts as experiments.

What works for getting cited by ChatGPT today might shift as the models and algorithms improve. Stay informed with the latest research and community findings.
]]>
https://pietromingotti.com/seo-for-ai-optimizing-content-for-chatgpt/feed/ 0
How will SGE impact SEO for businesses and agencies https://pietromingotti.com/how-will-sge-impact-seo-for-businesses/ https://pietromingotti.com/how-will-sge-impact-seo-for-businesses/#respond Wed, 15 May 2024 13:34:23 +0000 https://pietromingotti.com/?p=1339 Read more]]> March the 14th May 2024, Google finally dropped to the public the news we industry specialists knew since a couple years to come. The arrival of SGE (Generative AI Search Experience). In a nutshell, this means that the above the fold of most searches, in the future, will be served with AI-generated answers rich in context, media, videos, explainers, carousels, maps, charts, and more.

Understandably, this will change the future of search forever, and of SEO too. Or not? Here are my predictions based on what we know so far.

What is Google SGE / AIO / AIM

Google has taken a significant leap forward with its Search capabilities, introducing the new Gemini model, tailored specifically for AI-powered search. The Gemini model merges advanced functionalities like multi-step reasoning and multimodality with Google’s robust search systems.

Now, AI overviews are being integrated into general search results for U.S. users, with the rollout expected to reach over 1 billion users by the end of the year. These overviews can be fine-tuned in terms of language and detail, making them more user-friendly and personalized.

Additionally, the Gemini model excels at managing complex queries. For instance, you can ask for the best restaurants in Italy, including their specialty dishes and proximity to major landmarks, and get a detailed response. It also provides practical planning assistance for everyday needs, such as generating customized meal plans with recipes sourced from the web.

Google is also launching AI-organized results pages, which group useful information under unique, AI-generated headings, offering diverse perspectives and content types. Initially, this will cover dining and recipes, with plans to expand to other categories like movies, books, and shopping. Moreover, Google’s new visual search feature allows users to utilize video content for their queries, streamlining the search process and saving time.

For more details on Google’s innovative search features and the Gemini model, check out Google’s official blog here.

Video courtesy of the Google Blog

What kind of searches are served trough SGE?

As Google’s Search Generative Experience (SGE) evolves, its impact will become increasingly apparent across various business verticals, with significant implications for SEO strategies. According to studies and insights gathered from various sources, industries such as healthcare, ecommerce, B2B tech, and education are seeing varied levels of AI integration in search results, which significantly influences organic traffic patterns​ (Search Engine Land)​.

Interestingly, SGE’s effect is not uniform across all sectors. For instance, while healthcare queries show a high percentage of AI-generated answers, the finance sector experiences a much lower integration, suggesting a cautious approach in areas dealing with sensitive information​ (Search Engine Land)​. This makes me giggle a bit (and worry a lot) when you think at the fact that Lily Ray shared this on here LinkedIn:

All in all, from the data we have today, the introduction of Search Generative Experience (SGE) notably disrupted ecommerce, electronics, and fashion sectors the most, although it affected all business verticals to some extent.

Which businesses are going to be affected worse?

So, in my opinion we shouldn’t focus too much on how Google is going to show the results, but on how people search. This has always been the winning strategy. Focus on people, and how to adapt your inbound strategy to get the right people, at the right time, for the right search intent.

Media & Publishers

As an example, I believe that SGE is going to serve very well a 1 click query like:

“who won yesterday’s match between Napoli and Roma?”

“how do you calculate cost per click?

“How to make Pinacolada?”

You see, to better serve user experience (and Google is all about that), you want to give an immediate answer to that search, without any CTR to some sites with tons of ads and what not. That’s why I think the first and foremost vertical to really take a deep hit, is going to be the publishing and media (magazines, news websites, and so on).

These kind of businesses entirely rely on programmatic advertising, and for sure it’s a grime time for their existing content.

Part of E-Commerce, Wellness and Fitness websites

Other queries that SGE will serve well are ones on fitness, wellness, and some online shopping, but not all of it (meaning, discovery search, but not bottom of the funnel queries):

“What’s the best workout routine for a 6 months period to develop calves? Add some nutrition tips too”

“I want to eliminate meat from my diet, but I don’t know how to respect all of my nutrition macros”

“What are the best brands for natural non toxic skincare?”

“Best Christmas Gifts ideas for 2024”

So yeah, understandably these kind of queries can only benefit from SGE. Bravo, Google. So also discovery for e-commerces, and likely fitness and wellness sites are going to be impacted significantly.

Online free tools

I don’t know about you, but I love online tools. Calculators, chart generators, image-video-audio converters, you name it. These nifty utilities are there at our disposal to help throughout the day with our job and tasks. In exchange, they render advertising (that we all ignore). Well, this is going to change.

Infact, SGE is perfectly capable of serving all of these results, straight in the above the fold of SERP, and it can do that very quickly (and it will be quicker in the future.

Will Google entirely replace Search Engine Results with SGE content?

No, it won’t for 2 reasons:

  1. as I mentioned earlier, when we look at large numbers of queries, SGE servers mainly one click queries. This is informational intent that wants to be satisfied in a split second. No, it won’t take away your conversions in terms of lead generation, or ecommerce revenue in my opinion. Other things might though, like the rearranging of ranking in SERP that we recently saw. It may also cause the number of results pages to decrease significantly.

    Just think, as an example, at this website. If you needed a hand at configuring and testing consent mode v2 in gtm, you couldn’t get that from SGE. You’d still click the generated result, to inspect the article and go through it step by step.
  2. The other reason is based on the fact that Google bases a lot of their revenue on search ads. Self explanatory (sorry, I’m trying to keep this short).

Not all queries can be answered in a second; especially when a user is actively looking for a product or a service, they won’t just be satisfied with pretty graphs and charts and some pictures and cards. No, they want to buy the product, or they want to get the service that solves their issue.

That kind of traffic will still exist, and it’s the traffic that performance marketers and SEO’s are, after all, after.

Will this disrupt search? Yes, for sure.

Will it disrupt many online businesses, projects and things that were working great before? A good percentage, yes.

Is it ethical towards small and medium businesses? No, not a bit.

Will technical marketing survive this? It’s actually an age of opportunity, in my opinion, but there’s going to be a lot of turmoil.

What were the clues that this was about to happen

Well the clues were all there both from a technology industry advance, and a google core update themes standpoint.

Obviously AI has been the core of research in the past years, but it’s an ongoing field of research since 1956. It was just a matter of time until it reached Google b2c products. The same applies to cookies, by the way.

Then, when OpenAI released to the public its Chat GPT model, the race begun and the push to show Gemini was very strong.

Look at Google’s Core System Updates rollout, and think about them in the context of what Generative Search needs in order to work properly (and the environment it creates, with it).

To me, it’s clear that there’s been an active demotion of websites that exist to generate cheap traffic to render ads, and there’s been an accent and uplift in websites that present fresh content that only human experts can produce. And that’s what AI needs. Human expertise. They need our fresh ideas, ideas that come from real life, a life AI doesn’t have, both to train their models, and both to serve humans contextual content based on aspects AI just can’t make out quite yet.

I honestly don’t think so. Everyone who knows me know how much I am “against” AI, in terms of what it’ll do to our youth when it comes to the very idea of what “thinking” means. I am not a fan of machines when we are not completely in control, and when those in control are unreliable.

Having said that, I think AI will need to keep feeding on data, to gain fresh insight, and keep its answers top-notch. How can it do so, if it stops to sample from direct experiences of authoritative humans? Also, the other reason why I don’t think blue links are going to disappear, is that many people want to find information and pieces of information which are contextual to their need of the moment. There’s no way AI can address all of this; it’s science, not magic.

This is true at least as we don’t all have a chip in our brain. Then, I think SEO will have bigger problems 😉

]]>
https://pietromingotti.com/how-will-sge-impact-seo-for-businesses/feed/ 0 SEO - Pietro Mingotti nonadult
Case Study: E-commerce Organic Search Engineering https://pietromingotti.com/case-study-e-commerce-organic-search-engineering/ Thu, 27 Apr 2023 16:23:45 +0000 https://pietromingotti.com/?p=170 Read more]]> This Case Study focuses on the results obtained with Organic Search Engineering in two years in terms of Search Market Penetration, Organic Traffic Value and volume.

Abstract

The client is an emerging entity in the Digital Transaction Management and Technology industry. They covered both b2c (digital signature, certified communication..) and b2b (banking, insurance and digitalization of secured processes) services. Fuel LAB was assigned the goal to scale up e-commerce revenue exponentially both with performance advertising, and with the sustainability of organic search.

Two years later, the client is today recognized as “Large Tech Provider” by the Forrester Report, won the Aragon Research Innovation Award in 2022, and is nominated for the same award for 2023 and is recognized as business Exellence by Forbes.

This Case Study focuses on our approach and business impact on scaling up the E-commerce Revenue and Value Traffic acquisition trough Organic Search.

When we started getting involved in late 2020, The Client was facing several challenges to meet their goals, such as:

  • Existing strong competitors with full Market Saturations (InfoCert, Poste Italiane, Aruba..)
  • Virtual products with no visual or physical appeal, of which the target has little to none knowledge about in terms of use cases and functionality
  • Little to none pre-existing Performance Marketing experience and Search Traffic strategy, with over 20 websites to optimize and monetize.

Here’s how at Fuel LAB we have driven the scale up on e-commerce digital sales and value traffic acquisition.

Why Organic Search Engineering (O.S.E.)

Organic Search Engineering is the name of the practice designed by Pietro in Fuel LAB when developing Organic Traffic projects on a large scale. Click here for more informations on Organic Search Engineering.

The power of Organic Conversion Engineering and strategic volume growth trough Technical SEO projects have met and exceeded the goals set by the Client.

While at Fuel LAB we have entirely designed, organized and managed the Performance Search Engine Advertising, largely responsible of the whole e-commerce scale to success by generating significant Organic Search demand, focusing on Organic Search is a pivoting point of digital strategy for several reason, including but not only:

  • Strategic SEO and Technical SEO contribute to driving a consistent volume of highly converting traffic to the website, instead of merely focusing on volume, leading to a significant revenue impact.
  • Topical Authority and Top and Middle of the funnel search volume help to boost brand awareness, recognition, retention and coverage.
  • Increased search share saturation increases the user’s confidence and trust in the brand, scaling Conversion Rate also for other Traffic Acquisition Channels.

Our Organic Search acquisition strategy focused on acquiring constantly increasing volume of highly converting traffic to the website, instead of merely focusing on impressions volume.

This approach led to a yearly increasing volume of traffic from Organic Search with a significantly high revenue impact, while boosting the Brand Awareness and coverage thanks to Top of the funnel and Middle of the funnel queries.

Business Impact

Case Study data: February 2021 to February 2023.

KPISTARTING POINTGOALRESULT
Organic Search Saturation384.000/mo700.000/mo1.410.000/mo
Organic Search Volume40.410/mo100.000/mo133.000/mo
Organic Search Value€24.314/mo€50.000/mo€79.350/mo
Organic Share of Voice2%>5%7%
Top 3 position for non branded queries2050988
Through Organic Search Engineering, the client has moved to becoming a top competitor on Search Networks, scaling performance as follows:

  • Organic traffic acquisition +241.4%
  • Organic Search Value (e-commerce Revenue) +226.36 %
  • Search market saturation +297.2 %
  • First position keywords +4840.00 %

The client met and exceeded the goals set for Organic Search as a performance channel.

Furthermore, in the last two years it moved from a starting point of 78 Organic Keywords, to over 6.000 Organic Keywords. Moreover, the client moved from ranking top 3 position of page 1 for 20 search terms, to ranking for 988 search terms in top 3 position.

Worth of notice, is the fact that the highest value and volume from Organic Traffic is mostly entirely based on general queries, and not branded queries. This is of extreme strategic value.

Organic Traffic Volume growth: +231.4%

Organic Search Saturation and Volume

Start pointToday
Daily Search Impressions7.70159.799
Daily Traffic Volume1.2856.486
organic traffic growth namit 16mo

Organic Traffic Value growth & Ecommerce Impact: +226.36%

Organic Search Traffic e-commerce Value

Start pointToday
Organic e-commerce Revenue/mo€24.314/mo€79.350/mo
organic traffic ecomm value compared

First page, first positions ranking Keywords +4840.00%

Ranking of pages for number of indexed keywords / search queries

Start pointToday
Ranking 1-320988
Ranking 4-10202.472
Ranking 11-20382.619
Organic 1st Position Comparison 2021 2023
2 Organic 1st Position Comparison 2021 2023

Share of Voice and Search Market Share trough Technical SEO

Moved to top ranking entity, excluding Poste Italiane & Aruba

Start pointToday
Share of Voice0,3%7,3%
Traffic Share0,6%13%

In the last 2 years, when isolating the share of voice and the traffic share for 9 top competitors, the Client has reached the largest scale up in Average Position, and growing its Search Share Voice to 7,30% and its Traffic Share to 13%. This positions the client on top of all realistic competition, where the top 2 positions are occupied by “Poste” (the Italian governative entity) and “pec.it”, Aruba’s main domain (leading, solid player with highest investment in the market since over 10 years).

Here we can see how the Client has scaled its Share of Voice over all the competitors, excluding the brand giants “Poste Italiane” and “Aruba” (pec.it).

Share of Voice amongst competitors

Share of Voice growth summary
Avg Position and Share of Voice Growth over Competition

Traffic Share amongst competitors

client search traffic share over competitors

Strategy and Approach

The first step in our workflow, was to determine a census of the active websites, understanding their role in the Company’s online presence and ecosystem, segmenting the monetization websites from the informative and lead generation ones (b2b and b2c).

The second step was the competition’s websites and strategy analysis, crawl, and content gaps identifications, as well as content overlap. This gave us a clear mapping of what the competition looked like, and where did the opportunities lie in order to scale rapidly the client’s key metrics.

This allowed us to identify which competitors we could have outranked in 2 years (InfoCert, Register, Lepida, Sielte), versus which competitors were positioned out of historical brand positioning, which will require large investments on mass media and more time to outrank (Poste Italiane and Aruba).

This helped us identify the major pain points and weak links in the chain of the competition’s tactics, such as:

  • Average Technical SEO implementation
  • Poor investment in internal and external linking
  • Weak tagging and technical analytics implementation
  • Slow and inconsistent content updates and Sitemap updates

While the client has been briefed and informed on the need to invest in a significant Entity based Topical Mapping and Content Clustering (leveraging Semantic Search Engine Content Engineering), a solid technical SEO plan, joined with a redesign of semantic page content and better linking logic, has in 2 years scaled significantly the company’s Organic Search results, as per the numbers shown.

In depth Data Documentation

We have focused on a Technical SEO Approach to offer maximum Search Engine crawlability, optimize Crawl Budget, and redesigning the entire internal and external linking policies.

Once the policies were in place, we have selectively optimized all the pages of the e-commerce, including product pages, category pages and dedicated Schema Markups and local+xml sitemaps.

Product Pages

1.72M impressions, 53K sessions

Start pointToday
Daily Impressions1.76816.700
Daily Sessions165786
Thanks to the Semantic Optimization, Technical on page SEO (involving primarily rich linking structure, link title best practices and proper heading tag policies and Meta Tags) and Schema Markup, product pages have scaled to the following success results:
product pages impressions clicks 16mo

Category Pages

381K impressions, 24.3K sessions

Start pointToday
Daily Impressions954.119
Daily Sessions7297

The same approach has been run for Category pages, which offer the opportunity to showcase more related products, and direct traffic to better search-intent-matching pages.

category pages impressions clicks 16mo

Non Branded Searches

7.96M impressions, 190K sessions

Start pointToday
Daily Impressions42138.685
Daily Sessions211.478

The vertical growth of the client share of search and organic traffic acquisition was scaled up especially for non branded keywords, thus opening the gates for enormous amount of new opportunities to turn visitors into new customers.

In the specific business vertical of this client, branded keywords have rarely transactional purchase intent, and are more often versed towards support necessities.

non branded search terms 16mo

E-commerce Product Vertical Queries volume growth

Product Family: Digital Identity

13.7M impressions, 1.09M sessions

Start pointToday
Daily Impressions6.44050.914
Daily Sessions1.1535.866

The data shows the traffic volume over time for queries containing the word “SPID” and mispelling, trough regex .(spid|speed).

query matches spid

Product Family: Digital Signature

1.46M impressions, 67.8K sessions

Start pointToday
Daily Impressions4465.465
Daily Sessions17240
The data shows the traffic volume over time for queries containing the word “Firma” trough regex .firma..
query matches firma digitale

Product Family: Certified E Mail

4.81M impressions, 250K sessions

Start pointToday
Daily Impressions14423.409
Daily Sessions01.636

The data shows the traffic volume over time for queries containing the word “PEC”, trough regex .(pec|posta|mail|pecmail)..

query matches pec

Informational Queries growth

374K impressions, 6.35K sessions

Start pointToday
Daily Impressions532.373
Daily Sessions482

In growing the organic traffic volume, we have scaled up the client’s presence and reliability for Top and Middle of the funnel queries; here is a breakdown of informational (top of the funnel) search queries, isolated by who|what|when|how|why queries.

informational queries
informational query schema

Commercial Queries growth

56.5K impressions, 1.89K sessions

Start pointToday
Daily Impressions22211
Daily Sessions08

This chart shows the growth for commercial queries (Middle of the Funnel), isolated by best|top|vs|review

commercial queries
commercial queries schema

Transactional Queries growth

1.86M impressions, 81.7K sessions

Start pointToday
Daily Impressions1828.021
Daily Sessions19686

This chart shows the growth for transactional queries (Bottom of the Funnel), isolated by buy|cheap|price|purchase|order

transactional queries
transactional queries schema

Long Tail Keywords growth

362K impressions, 18.4K sessions

Start pointToday
Daily Impressions4465.465
Daily Sessions17240

This chart filters search queries containing more than 4 words, thanks to the following regex applied to Google Search Console: (\w+\s){4,}\w+

4 words long tail keywords

Share of Voice and Traffic Market Share

Start pointToday
Share of Voice0,3%7,3%
Traffic Share0,6%13%

Share of Voice amongst competitors

Share of Voice growth summary
Avg Position and Share of Voice Growth over Competition

Traffic Share amongst competitors

client search traffic share over competitors

Total Domain Growth

53.7M impressions, 2.69M sessions

Start pointToday
Daily Impressions63.120269.873
Daily Sessions2.82215.987
While the rest of this Case Study focused on the www.domain.it and only the e-commerce website (while comparing to all subdomains for competitors), we have been working on several websites part of the client’s web ecosystem. Here’s the Search volume growth for the whole domain.
total domain growth 16mo

Organic Traffic Value

€ 1.5M

Start pointToday
Monthly Organic Revenue€ 24.197€ 68.776

The directly attributed Organic Traffic Value on the e-commerce have succeeded the expectations quickly, ramping up the average monthly organic traffic value from € 24.197 to € 68.776.

Monthly SEO traffic Value compared 2023 2021

This way, we brought Organic Traffic to represent 1/4 of the total Ecommerce Revenue for the tracked period (2 years), almost matching the value of paid traffic, reaching the directly attributed value of € 1.000.718. When considering indirect conversions and mixed-channels paths, Organic Traffic has surpassed the revenue of Paid Advertising, proving SEO’s and Organic Conversion Engineering as a primary and fundamental business asset for e-commerce websites.

2 Years result
Paid Traffic Revenue€ 1.422.045
Organic Traffic Revenue€ 1.000.718
Organic Traffic Indirect Revenue€ 1.515.348

Direct Attribution (last click)

impact of SEO on Ecomm Revenue

Organic Traffic and Paid Traffic Overlap in Conversion share

last click attribution and cross channel overlap

Assisted Conversions Organic Traffic added Value

assisted conversions organic traffic added value

Conclusions

While Fuel LAB has been responsible to the Digital Technical and Strategic Performance marketing, involving also Data Intelligence, Pay per Click Advertising, Social Media Advertising and Conversion Rate Optimization, Organic Search has represented one of the most interesting challenges, proving also in the case of a small player (in terms of digital presence) in an already established market, can significantly scale over competition and match up to paid traffic Ecommerce Revenue.

The main difference and value from an investment standpoint in Organic Search, is that traffic and results keep growing in time, offering a way lager ROI when compared to Paid Traffic, even in the field of Performance Marketing.

While traditional SEO (Search Engine Optimization) is often insufficient from an organic traffic value standpoint in today’s AI driven ranking and Search Engine evolution, the holistic approach of O.S.E. was proven capable of reaching these results with no backlink and digital PR strategy.

Additional Key Metrics

KPIStarting PointResult% Delta
Monthly Organic Search Value€ 24.197€ 68.776184.23%
Share of voice0,3 %7,3 %600.00%
Monthly Organic Search Saturation1,84 M6,04 M255,56%
Organic Product Pages Traffic3,15k12,8k306.35%
Monthly Organic Search Volume47k131k178.2%
Organic Non branded search14k73k371,23%
KPIStarting PointResult% Delta
Daily Organic Search Value€ 305,16€ 5.5511720.00%
Share of voice0,3 %7,3 %600.00%
Daily Organic Search Saturation18,7k72,5k300.00%
Daily Organic Product Pages Traffic170686303.53%
Daily Organic Search Volume3.5k15,9k354.29%
Daily Organic Non branded search6406.7k946.88%
]]>
Organic Search Engineering https://pietromingotti.com/organic-search-engineering/ Thu, 27 Apr 2023 16:18:23 +0000 https://pietromingotti.com/?p=206 Read more]]> Organic Search Engineering (OSE) is a comprehensive and innovative approach to Search Engine Marketing that has been developed in Fuel LAB under the my guidance and design.

What is O.S.E. (Organic Search Engineering)

Organic Search Engineering (OSE) is an innovative approach to Search Engine Marketing I have refined and created at Fuel LAB®. The methodology utilizes a combination of technical SEO tactics and proprietary technologies, such as Conversion Engines, alongside a content strategy based on Semantic Entities and Topical Clustering, plus O.P.N (Other People’s Network) Digital PR and traditional Digital PR for backlink generation. This approach enhances four key areas for the client:

Topical Authority

By leveraging Semantic Entities and Topical Clustering, OSE aims to establish the client’s website as an authoritative and reliable source of information on their topic, satisfying Google’s E-E-A-T guidelines and complying with the Helpful Content Update. Projects are developed by the I.A.D. protocol (Isolate, Amplify, Deepen) developed in Fuel LAB®, focusing on three fundamental steps:

  • Isolate the website semantic entities
  • Amplify the topical authority with topical clustering
  • Deepen experience proofing with long form content + rich schema markups.
  • Powerful Social Proofing trough verified reviews
organic search engineering topical clustering and mapping table by Pietro Mingotti
Part of an Organic Search Engineering Project (80 lines of content total) displaying the way we organize content clustering.

Search Share Saturation

OSE aims to increase the client’s visibility in their niche market through a content plan based on Topical Authority and whitehat SERP bombing through Conversion Engines (conversion engines are technical SEO products we have been using on SERP since 2014, which can sustain up to 4.999 indexed pages per website).

It’s important to consider not only money pages and single indexed articles or content, even if their metrics are outstanding; sooner or later, the project will loose its rank. Every experienced SEO knows that there’s rarely such a thing as more than a couple quarters ranking first for a traffic worth keyword.

By amplifying the number of pages ranking for a wider coverage of main keywords and long tail keywords, outranking competitors is easier, and the higher SERP saturation delivers a strong Brand Awareness effect. All of this, concurs holistically to an increased Conversion Rate through Organic Search.

Search Market Penetration

OSE has proven to help clients expand into new markets by outranking well established competitors with poor Topical Authority. Be it because the market is a niche one and there aren’t many skilled competitors, or be it because of sleeping giants in the industry, leveraging their brand authority and overlooking the nitty gritty work on SERP, OSE pushes the website’s authority beyond the one of other well positioned websites.

We have observed this in several cases, included the one of the Case Study on Organic Search Engineering that we have published in 2023. The Case Study featured the success of the new kid in the block in the world of Digital Transaction Management, ramping to become an industry leader in 2 years, surpassing the stakeholders expectations and predictions.

traffic share and search market penetration results trough OSE (Organic Search Engineering)
Screenshot of our Organic Search Engineering Case Study, showing our project surpassing all of the competitors in top ranks, and most of the Share of Voice in search.

Increased Social Proofing and Trust

Trustpilot® is used as a CRO technique to strengthen user trust and confidence while improving Domain Rank through its Domain Authority. While Trustpilot® quality as a SaaS is undoubted and widely recognized, there are some technical aspects about it (SEO wise) that not everybody is aware of.

While there are several Case Studies proving how Trustpilot® is able to enhance conversion rate with impressive numbers thanks to their wonderful widgets and other features, it goes often silent the fact that while Google reviews are not verified, Trustpilot® one’s are, and Google knows it. In fact, Google officially recognizes Trustpilot’s Domain Rating, and displays rich snippets of the review stars directly in SERP; both on organic search, and both on RSAs (Responsive Search Ads).

By introducing direct links and Trustpilot’s owned widgets into the project, we have observed an enhanced and faster ranking for commercial and transactional queries, as the trust isn’t just increased in the users (which is fundamental for CRO), but also for the ranking system.

The A.P.A. Logic

Moreover, OSE is designed to help the client get more high-quality traffic to their website, resulting in increased conversions and organic revenue. The approach is a long-term strategy that has shown to provide sustainable results over time, based on Fuel LAB’s A.P.A Logic:

  • Assessment: Conducting an assessment of the business’s capabilities, product marketability, competitor landscape, and budget.
  • Analysis: Analyzing historical data, competitor strategies, and conducting topical and keyword analysis.
  • Planning: Developing a Traffic Funnel Strategy, Semantic Entities and Topical Mapping, Editorial and Backlink Profile Planning, Budgeting and Resource Scouting, Conversion Engine LTK Project Planning, and Analytics and Rank Tracking Planning.
  • Production: Producing pages design and creation, schema markups design and creation, on-site and on-page technical SEO, digital PR for backlink profile, analytics and rank tracking production, and dashboarding and reporting production.
  • Analysis: Analyzing results, content nurturing, new content production, isolating new topics and niches to cover further, and isolating optimizations and CRO to report.
  • Assessment: Reporting results every 3 months, measuring KPIs every 3 months, and producing a case study within 2 years.

O.S.E. is a comprehensive and effective approach to Holistic SEO that we have refined over time to help businesses establish themselves as authoritative and reliable sources of information, increase their visibility in niche markets, and drive high-quality traffic to their website with a power ROI driven approach.

To learn more about Organic Search Engineering, please visit Fuel LAB or request a meeting.

]]>
Hiding H1 for SEO Homepage https://pietromingotti.com/hiding-h1-for-seo-homepage/ https://pietromingotti.com/hiding-h1-for-seo-homepage/#respond Fri, 11 Nov 2022 12:13:52 +0000 https://pietromingotti.com/?p=65

Case Study: improved Organic Traffic and no ranking deficit or penalties when hiding H1 for SEO Homepage of brand.

SEO Case Study: Hiding H1 for SEO Homepage

The client is a large multinational corporation working in software and technology services, operating both in b2c, b2b and governative market. We are working on SEO for several web properties for this group trough Fuel LAB.

Meeting SEO needs and design / branding needs is sometimes challenging, especially when it comes to the brand’s homepage.

This client’s homepage had as H1 a marketing-oriented sentence that said something like “All that you need for your day to day life etc”. I have expressed the need to have the main keyword in H1 (old SEO, I know, but.. keep reading), and the best result I have achieved was to edit that sentence writing “All that you need for your day to day life, thanks to [Brand] innovation”. Something like that, don’t take the example as an exact match to what we actually did.

The improvement on the homepage ranking and impressions was visible, yet this new title was making me even more uncomfortable. First of all, as a SEO you need to be able to do what you have to do, without compromising user experience, while actually bettering it. Secondly, I’m stubborn.

seo apple website h1
Check out Apple’s website. Right now they do have the word “Apple” on homepage, because it’s black Friday promotion time; but notice, that sentence is not the H1. It’s actually an H2.

I mean, this is my point: let’s assume the brand was “Apple”. You wouldn’t expect to head over to Apple websites and stumble upon a big hero image on the above the fold that states <h1>Apple</h1>, right?

It would be counterproductive from every standpoint, and surely a known Brand doesn’t need to put their name on their homepage. Especially when it comes to a Brand that is so popular, probably one of the most popular brands around. We would all agree that Apple doesn’t need an H1 on their homepage stating “Apple”, especially with that whopping Domain Authority and the general echosystem and search volume Apple has.

Right?

Well, think again.

After noticing how this was a widely spread technique by the major brands online, we have decided to implement this technique on two websites.

  1. Our Client’s website, which we won’t disclose for NDA reasons
  2. Fuel LAB’s website, the company I funded, which is a much smaller, insignificant website in the SEO Echosystem.

H1 and main keyword

Any SEO in the world knows that, despite John Müller misleading words on the topic, Heading Tags (and especially H1 tags) are necessary and an important part of page structure when it comes to telling the crawler what the content of a page is about. H1, in the specific, should explain directly the main topic of the page.

The playout is pretty straightforward when it comes to internal pages and subpages, but what about the Homepage? Normally the homepage should have no other “topic” if not being the “homepage” of, in these cases, a business.

So, how come that not only Apple, but also Logitech, Zappos and many other don’t show the brand keyword in the homepage? Where is that h1 that seems nowhere to be found?

The H1 Tag is there, but it’s hidden.

Hiding the H1 Tag

The H1 tag on all of these websites appears to be visually hidden via CSS, in some case in a very telling way (class is SEO_Title and such). This allows the brands to tell the Search Engine that that’s indeed the homepage of that brand’s website, while not showing it to the users.

This avoids design compromises, huge copywriting headaches for the homepage, which is often a hub of link with high relevancy, which easily produce sitelinks extensions.

See for yourself these hidden H1 tags in all of these websites.

<h1 class="visuallyhidden">Apple</h1>
.visuallyhidden {
    position: absolute;
    clip: rect(1px, 1px, 1px, 1px);
    -webkit-clip-path: inset(0px 0px 99.9% 99.9%);
    clip-path: inset(0px 0px 99.9% 99.9%);
    overflow: hidden;
    height: 1px;
    width: 1px;
    padding: 0;
    border: 0;
 }
<h1 class="seo-pagetitle">Logitech: Mouse, tastiere, cuffie con microfono wireless e accessori per videoconferenze</h1>
.seo-pagetitle {
    position: absolute;
    clip: rect(1px,1px,1px,1px);
    clip-path: inset(0px 0px 99.9% 99.9%);
    overflow: hidden;
    height: 1px;
    width: 1px;
    padding: 0;
    border: 0;}
<h1 class="rb-z">Zappos Homepage</h1>
.rb-z {
    clip: rect(1px 1px 1px 1px);
    clip: rect(1px,1px,1px,1px);
    height: 1px;
    overflow: hidden;
    position: absolute;
    width: 1px;

One might think to achieve this purpose by setting this rule:

  • h1 display:none

I think that this would trigger a cloaking signal to the AI. Instead, in general, there seems to be a trend about the technique, aside from the css class name: they all share the same way to hide the H1 tag:

  • Position is set to absolute
  • clipping is implemented
  • the size of the element is set to 1px x 1px

Is hiding the H1 Tag considered cloaking?

The first thought I had was considering if the hidden h1 seo technique was in some ways considered cloaking. The definition of cloaking is pretty straightforward: cloaking is a black-hat practice in which the SEO is trying to show the users a different content than the one actually shown to people.

It’s a serious red flag for Search Engines that will recognize this signal as:

  • potentially dangerous, compromised or hacked website,
  • hosting hazardous or prohibited content
  • it will trigger Google’s rules contained in the SPAM update

Therefore cloaking is an absolutely discouraged technique, like all black-hat SEO techniques (does black hat even exists as a term anymore?). And yet, how comes that brands such as Apple use this technique for the homepage?

Is it permitted because of the exceptional Domain Rating, history, and reliability of the brand? Or is it maybe that Google’s intelligent enough to understand that if you’re doing it only on the homepage, you know exactly what you’re doing, and therefore you’re not using SEO policies against Google Search Central’s terms and conditions?

Is hiding other pages content considered cloaking?

Hiding heading tags, willingly showing to users a different content than the one we are actively suggesting to the Search Engine, is definitely considered cloaking and will contribute to negatively affect your rankings.

Here is Google’s “Cloak of visibility” research led by the North Carolina State University, written in 2017 by Luca Invernizzi, Kurt Thomas, Alexandros Kapravelos, Oxana Comanescu, Jean-Michel Picod, and Elie Bursztein.

With all these technologies set in place, you are guaranteed to receive a penalty as soon as Google realizes what you’re trying to achieve.

Results of the SEO Experiment

The main objective of this experiment was to avoid having drops in rankings, CTR and impressions, while implementing the correct heading tag for the homepage.

Not only the website didn’t receive any sort of penalty, but while the average number of impressions of the site’s homepage were stable, the CTR and traffic volume scaled significantly, staying stable ever since (and with a positive delta due to further SEO work in the months past September 2022.

Google Search Console homepage and brand name queries results after hiding h1 for SEO homepage
Google Search Console CTR and Clicks ramped after the deployment of hidden h1 on homepage, for branded searches on the homepage.
Google Search Console homepage and non branded queries results after hiding h1 for SEO homepage
The same phenomenon is observable on the non branded searches that led traffic to the homepage; CTR and click volume grew exponentially.

So, this means that:

  • Hiding the H1 on your homepage, where the H1 is the sitename, doesn’t harm your SEO.
  • Providing a correct H1 tag for your homepage is capable of improving your organic traffic also for non branded terms.

Final thoughts:

Despite what Google likes to tell to the public, Heading tags and traditional technical SEO are still fundamental for strategic ranking. When John Muller as a spokesman of Google tells you that you don’t need heading tags or H1 for that matter, and they are simply useful, he is telling this:

“here at Google we are investing immense amount of money on AI, and we don’t want SEOs to try and manipulate rankings sending technical signals to our machines; we want our machines to freely index and rank things on their own, so just don’t worry about it.”

While on one hand I absolutely support the need of letting Machine Learning and AI work, because this is what the future of digital and search is going to be all about, that doesn’t mean that you don’t need Tech SEO. Actually, you need that even more.

Providing the correct information and context to the machines that are concurrent do determine ranking signals that overall are going to output into SEO performance, is even more critical now than it was before, because we have a less extended amount of freedom in how much to affect how AI is going to interpret our content and its value.

Specifically, Heading 1 Tags are very important to determine how the crawler is going to understand the topic of that page, and consequentially understand your Heading Tags strategy, and the relevance of links and content within the paragraphs under your heading tags, especially on your homepage.

]]>
https://pietromingotti.com/hiding-h1-for-seo-homepage/feed/ 0