Docs / Advanced
SEO & anti-footprint — what every bundle ships with
How Site Generator builds per-tenant uniqueness — text, images, infra — and what SEO plumbing every static export ships with by default.
Affiliate sites get penalised by two things: duplicate content (the same boilerplate paragraph spread across a hundred sites) and shared infrastructure (the same IP, same theme, same image URLs, same OG card). Site Generator is designed around defeating both. Uniqueness isn't a flag you flip — it's how the pipeline works by default. This guide is the full inventory: what we ship in every bundle, how we make it non-fungible, and what's coming next.
The duplicate-content trap
A typical affiliate-site generator renders the same template, the same JSON-fed text, the same stock images, and hosts the output under a shared subdomain or CDN. Search engines collapse those sites into one canonical and rank exactly zero of them. Site Generator's whole pipeline — content, images, infra — is built to make sure no two tenants look the same to a crawler, even when they share a brand.
Text — facts → vars → variants → materialised prose
Every brand site has ~15 long-form text fields per casino × every enabled locale: introduction, banking section, sportsbook overview, FAQ pool, bonus copy, conclusion. None of them are stock paragraphs. The pipeline that produces them:
- Catalog facts — every casino in our catalog carries hundreds of atomic facts: licence jurisdiction, founding year, supported payment methods, crypto list, fiat list, languages, currencies, has-sportsbook flag, bonus structure, RTP, score. These are M:N joins, not free text.
- Synthetic variables — facts surface as
%Casino*%variables:%CasinoLicense%,%CasinoPaymentList%,%CasinoCryptoCount%,%HasSportsbook%, dozens more. Per-locale presets add language-correct list separators and grammar. - Conditional spintax — templates use
{?HasSportsbook?…|…}gates so a casino without sportsbook never advertises one. Whole sections (the Sportsbook tab, the crypto banking block, the live-dealer FAQ) only render when the underlying facts justify them. - Plural agreement — the
{plural N: forms}engine handles grammatical number across locales (Russian has 3 forms, English 2). Counts from the catalog ({plural %CasinoCryptoCount%: монета|монеты|монет}) land grammatically correct every time. - Per-tenant deterministic seed — the spintax resolver seeds with
siteId:casinoId:lang:field. Two sites on the same brand walk different paths through the probability tree. Same seed always yields the same output, so the text is reproducible for debugging — but changing any of the four inputs changes the path. - Materialisation at bootstrap — the resolved paragraph lands in
site_casino_content(scc) and the public render reads scc directly. Spintax never runs at request time. Crawlers see the same HTML on every visit; no two sites ship the same paragraph; operators can hand-edit any cell and the edit survives regenerate-all.
For per-locale prose, the same pipeline runs against per-language preset variables — Russian morphology, Ukrainian payment-method lists, EN crypto names. FAQs and review excerpts are also translated through DeepL where the source is locked English text, not spintax. See Editing content & the edited flag for the operator-side view.
Images — per-tenant sprite atlases
Images are the second axis of fingerprinting. Affiliate-site networks routinely use the same image URLs across hundreds of sites; that's what makes them a network in the first place. Site Generator generates a fresh sprite atlas per tenant, signed against the tenant's own R2 bucket, served under the tenant's own domain.
- Six sprite families — banners, providers, grid-banners, reviews-avatars, review-flags, language-flags. Each family bundles dozens of small assets into a single atlas served from
/img/<family>.webp. - SVG-first source — casino logos, payment method icons, language flags, crypto badges all come from the catalog as raw SVG. The sprite-runner rasterises them at the resolution your tenant needs and packs them into the atlas. Vectors mean we can re-render at any density without sourcing new files.
- Per-tenant layout shuffle — provider carousels, language pickers, and other lists run through
seededShuffle()with a per-site seed. Two sites with the same provider list get different visual orderings — and the sprite atlas itself is laid out in that order, so the pixel coordinates inside the atlas differ between tenants. - Per-site brand palette — extracted from the casino logo at bootstrap; sprite background fills, hover states, and CSS variables all derive from it. No two tenants share the colour pipeline.
- Cache-bust per atlas — every sprite URL includes
?v=<updated_at>baked from thesite_spritesrow's mtime. Regenerating a sprite atomically updates the URL — clients fetch the new atlas immediately, no stale-cache wedge. - Containerised sprite-runner — sprite generation runs in a dedicated Cloudflare Workers container (standard-1, ½ vCPU + 4 GiB) with concurrency 5. Bootstrap-time sprite generation for a typical brand site runs ~7 seconds end-to-end.
Infra — customer-owned everything
Static export means the bundle lands on your hosting, not ours. That's the third leg of the anti-footprint stool, and the only one that doesn't require any platform-side work to deliver:
- Your Cloudflare account — every Pages project belongs to you. We hold a narrow API token you rotate; the runtime is fully on your side.
- Your domains — CNAMEd to your CF Pages project, served from your CF zone. The bundle bytes don't know about generator.ink at all.
- IP / ASN diversity by definition — your sites resolve to CF's anycast pool under your account, not under a shared platform IP. No network-layer fingerprint links your tenants to each other or to other Site Generator clients.
- Per-site OG image — generated per tenant during bootstrap. Social previews never share an image hash between sites.
SEO toolkit — what every bundle ships with
On top of uniqueness, every static export ships a standard SEO baseline. None of this requires configuration — it's emitted by default during bundle materialisation.
- Server-side rendered HTML — full content on first paint. No JS gating, no SPA hydration delay, no
noscriptfallback needed. - Canonical URL per page (locale-correct on multi-host tenants).
- hreflang alternates for every active locale plus
x-default, sourced fromsite_locales. Multi-host tenants resolve each locale to its production host; slug-prefix tenants resolve to/<lang>/<path>. <html lang>set per locale./sitemap.xmlper host — includes every page in the host's locale, with<xhtml:link rel="alternate" hreflang>cross-references for each multi-host alternate./robots.txtper host — preview hosts getDisallow: /so the playground subdomains never compete with your production URLs; production hosts allow everything except/go/,/signin,/signup.- JSON-LD structured data — every casino review page emits
Organization,Review, andAggregateRating; rating templates addItemListfor the ranked-card grid; article pages emitArticle. Roadmap:BreadcrumbListandFAQPageon casino pages are queued behind a small refactor of the breadcrumb + FAQ components. - Open Graph + Twitter cards per page — title, description, OG image (per-tenant),
og:locale, canonicalog:url. - Hostname-aware
/go/plumbing — CTAs render as relative paths (/go/<slug>?utm_source=…) and resolve at the edge via CF Zone Rules (multi-host TDS) or via_redirects(static-redirects). Affiliate links never bleed page-rank because they're never in the rendered HTML — see Traffic modes. - Internal-link localization —
localizeInternalHtmlLinks()rewrites every internalhref="/path"in editorial prose to the correct per-locale URL. No broken cross-locale links, no stranded/en/prefixes leaking into the RU bundle. - Sprite atlases as a single HTTP request — six families × dozens of assets = one fetch per family per page, with long-TTL caching keyed by the cache-bust version. Helps Lighthouse Performance and crawl budget.
Pre-flight verify gate
Before any bundle uploads, verify-deploy-readiness.ts runs an audit and refuses to proceed if anything required is missing:
- Every required scc cell (text field) published per (site, casino, lang).
- Every article published per (site, slug, lang) — no half-translated pages.
- FAQ pool size (≥10 published variants per slot) when faqV2 is on.
- Sprite families present and not flagged dirty in
site_sprite_dirty. - Flag-gated fields obey the flag —
sportsbook_reviewonly required whenhas_sportsbook=1.
Exit code 1 blocks the deploy with a punch list of missing inputs. Exit code 2 warns on non-critical gaps (missing short_title on an optional locale, etc.) without blocking. The principle is the same: never ship a partial bundle into production where crawlers will index 404s as canonical pages.
llms.txt — AI-search opt-in
For tenants who flip the llms.txt toggle in the cabinet's SEO panel, every attached host also gets a curated /llms.txt at the root — an emerging Markdown convention for guiding AI agents (ChatGPT, Claude, Perplexity) to the canonical entry points of the site. The file is generated server-side from your scc and article rows; per-host on multi-host tenants. No measurable downside — traditional crawlers ignore it, AI agents that don't support it ignore it too — but a growing signal for the next generation of search. Off by default; opt in when you're ready.
What's not in the bundle (and why)
Honest list of the gaps, with the planned closure when there's a date:
- BreadcrumbList + FAQPage JSON-LD on casino pages — landing site has them; casino templates are queued behind a breadcrumb-component refactor.
- Dual-theme provider sprites — single-theme today; a fill-aware dark-mode atlas is in spec but not built.
- JPEG quality jitter on sprite atlases — only layout shuffle today; pixel-level variation is on the backlog.
- Per-page custom
<head>injection — per-site overrides cover the OG image and a few SEO toggles; arbitrary tag injection isn't exposed in the cabinet yet — ask support if you need it.