Percy vs Chromatic for Maps

Choosing between Percy and Chromatic for a map application is a workflow-alignment decision, not a feature checklist. Both are managed visual-regression platforms that capture rendered frames in a headless browser and diff them in the cloud, and both will flood you with false positives the moment a WebGL canvas, an animated flyTo, or a network-fetched tile slips into the snapshot before it has settled. The real question is which tool’s capture model and review surface fits how your map UI is built — isolated Storybook components versus full-page integration views — and how much determinism work you are willing to own in CI to make either one trustworthy. This page scopes that decision specifically to cartographic interfaces and the failure modes they introduce.

This page extends the concepts in Web Map Visual Testing Fundamentals & Toolchains, narrowing the foundational ideas of determinism, synchronization, and threshold tuning down to the concrete trade-offs between these two commercial vendors.

What Percy and Chromatic Actually Do With a Map Frame

Both tools intercept a rendered frame from a headless browser and upload a pixel buffer to their cloud for storage and comparison — neither re-renders your HTML server-side from a serialized DOM. The divergence is in where and what they capture:

Percy runs the @percy/cli agent inside your CI environment, drives the page through Playwright, Cypress, Puppeteer, or its SDK, and uploads a full-page bitmap (plus responsive-width variants) to Percy’s cloud for diffing.
Chromatic is coupled to Storybook. It builds your Storybook, captures one screenshot per story via the test runner, and uploads those component-scoped bitmaps for cloud diff and design-system review.

For a map, that distinction decides what you are even testing. A Percy snapshot of an integration route captures the whole composed view — basemap, overlays, legend, attribution, and control chrome together. A Chromatic snapshot captures a single map widget mounted in isolation with mocked props. Neither model removes the hard part: the synchronization burden falls entirely on your capture setup. You must prove the map viewport has stabilized — tiles fetched, WebGL compositing flushed, label collision resolved — before either tool’s capture step fires. This is the same contract described under screenshot capture, sync and comparison logic; the Khronos WebGL specification explains why GPU execution is asynchronous relative to JavaScript, which is precisely why an explicit idle handshake is mandatory rather than optional.

Deterministic Capture: The Work Neither Vendor Does For You

Geospatial UIs are non-deterministic by default. Map libraries continuously fetch tiles, animate camera transitions, and restyle features as zoom and bounds change. Before Percy or Chromatic can produce a stable baseline, three controls have to be in place regardless of vendor:

Animation freezing. Disable easing, fly-to transitions, and continuous render loops. Force synchronous camera moves with map.jumpTo() (MapLibre/Mapbox) or map.setView(coords, zoom, { animate: false }) (Leaflet) instead of their animated equivalents, so the frame is photographed at rest rather than mid-ease.
Network mocking. Intercept tile requests at the fetch/XHR level and serve pre-baked tile fixtures from a local mock so CDN cache variance and network jitter cannot alter the render. This is the deterministic-capture discipline detailed in handling async tile loading.
Viewport and DPR locking. Standardize container dimensions, device pixel ratio, and the geographic bounding box. Map libraries scale tile density by DPR, so an unpinned DPR produces different raster output on every runner.

The most common mistake is treating these as Percy-specific or Chromatic-specific knobs. They are not — they belong to your test harness, and they must run before the snapshot call no matter which vendor consumes it. Teams frequently prototype this stability layer against an open pipeline first; the patterns in Open-Source Visual Testing Stacks pre-validate snapshot determinism before any per-snapshot cloud cost is incurred.

Step-by-Step: Wiring a Deterministic Snapshot Into Each Tool

The idle handshake is identical across vendors; only the final capture call differs. The procedure below produces a settled map frame and hands it to whichever platform you have chosen.

Lock the rendering environment. Pin viewport size and deviceScaleFactor at browser-context creation so neither layout reflow nor subpixel interpolation varies between runs.

const context = await browser.newContext({
  viewport: { width: 1280, height: 800 },
  deviceScaleFactor: 1,
  colorScheme: "light",
  timezoneId: "UTC",
});

Freeze map animation and mock tiles. Disable easing and route tile traffic to fixtures before the map initializes.

await page.route("**/tiles/**", (route) =>
  route.fulfill({ path: `fixtures/${tileKey(route.request().url())}.pbf` })
);
await page.evaluate(() => {
  map.jumpTo({ center: [-0.1276, 51.5074], zoom: 12 });
});

Await the idle-then-requestAnimationFrame handshake. Wait for the engine’s idle event and one paint frame so capture never races a half-loaded pyramid.
```
await page.evaluate(() => new Promise((resolve) => {
  map.once("idle", () => requestAnimationFrame(resolve));
}));
```

Capture for Percy (full-page integration). Call the SDK after the map is settled; Percy uploads the bitmap and its responsive variants.

import percySnapshot from "@percy/playwright";
await percySnapshot(page, "Map — London z12", {
  widths: [1280],
  percyCSS: ".mapboxgl-ctrl-attrib, .leaflet-control-attribution { visibility: hidden; }",
});

Capture for Chromatic (isolated widget). Express the same settled state as a Storybook story with a play function, then let Chromatic snapshot it.

export const LondonZ12 = {
  play: async ({ canvasElement }) => {
    const map = await waitForMapIdle(canvasElement);
    await freezeAnimations(map);
  },
  parameters: { chromatic: { diffThreshold: 0.063, delay: 300 } },
};

Gate the pull request. Fail the job when a diff exceeds the configured threshold, with region-of-interest masks applied to dynamic chrome. Percy gates through its GitHub/GitLab status check; Chromatic uses chromatic --exit-zero-on-changes for non-blocking review or omits the flag to block on changes.

CI/CD Integration and Environment Parity

Integrating either tool demands strict environment parity, because headless browsers rasterize text and vector paths differently depending on installed fonts and subpixel configuration. Bake an identical font stack (for example fonts-noto with explicit fontconfig overrides) into the CI image and force software rasterization where it yields more predictable output than per-runner GPU drivers — the same containerization discipline that underpins reliable baseline management for tile servers.

Percy integrates via @percy/cli and parallelizes cleanly across matrix jobs; it expects a PERCY_TOKEN in the environment and finalizes a build once all shards report. Chromatic runs as npx chromatic --project-token=$CHROMATIC_PROJECT_TOKEN, builds Storybook, and uploads per-story snapshots. A representative GitHub Actions job for the Percy path:

jobs:
  visual:
    runs-on: ubuntu-22.04
    env:
      PERCY_TOKEN: $
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm ci
      - run: npx playwright install --with-deps chromium
      - run: npx percy exec -- npx playwright test visual/

The Chromatic path is structurally similar but swaps the final step:

      - run: npx chromatic --exit-zero-on-changes --only-changed

Whichever you adopt, stage visual snapshots after unit and component tests, gate merges on diffs below the configured threshold, run them only when map-related files change to control cost, and seed mock tile servers plus disable service workers in a pre-flight step. Playwright’s device emulation keeps DPR, viewport, and user-agent identical across distributed runners — the single biggest source of cross-runner baseline drift.

Cross-Browser and Cross-Environment Considerations

The two tools expose different cross-browser surfaces, and for maps that difference is consequential. Percy renders responsive widths from a single captured DOM and offers cross-browser rendering (Chrome, Firefox, Safari) on its cloud at higher tiers, but a WebGL canvas captured locally is uploaded as-is — so the browser that captured the frame is the browser whose GPU rasterization you are actually testing. Chromatic captures in its own Chromium build and can fan out to additional browsers per project; each adds a separate baseline that must be reviewed independently.

For map work the practical rules are the same as for any deterministic tile capture pipeline:

Maintain one baseline per engine, never a canonical cross-browser image. A Chromium-on-Ubuntu render will not match WebKit-on-macOS at the pixel level; forcing one baseline guarantees permanent false positives on every engine but one.
Force a software GL backend (--use-gl=swiftshader or ANGLE-on-SwiftShader) so rasterization is reproducible regardless of the runner’s physical GPU.
Pin system libraries (libgl1, mesa, fontconfig) in the container image; an unpinned package upgrade is the classic cause of a baseline that “passed last week.”
Set preserveDrawingBuffer: true on the WebGL context so the framebuffer survives long enough to be read back at capture time.

Engine-specific rendering trade-offs across map libraries are explored further in the child page on choosing tools per renderer, below.

Diff Tuning and Cartographic Baseline Curation

Standard UI diff algorithms weight every pixel equally, which is exactly wrong for cartography. Anti-aliasing variance across Chromium and WebKit, subpixel text rasterization, and raster-tile compression artifacts all trigger false positives. Both Percy and Chromatic expose a tolerance knob — Percy via per-snapshot percyCSS plus its sensitivity setting, Chromatic via a diffThreshold (a fraction of the colour-distance scale, default ≈ 0.063) — but the deeper algorithmic treatment, including SSIM and perceptual hashing, belongs to diff algorithm tuning for cartography.

The core tension is the same on either platform. A near-pixel-perfect tolerance is too strict for WebGL vector tiles, where a minor shader-compilation difference can shift label edges by one or two pixels. A loose tolerance masks real regressions — broken label-collision logic, wrong layer z-ordering, a dropped data overlay. The way out is structural rather than numeric: apply region-of-interest masking to dynamic controls (zoom buttons, attribution, scale bar) and ignore transient overlays (spinners, tooltips), exactly as covered under dynamic element masking and UI stability. Then tier your baselines so core map rendering is reviewed separately from UI chrome, legends, and data overlays, which stops a single benign tile-seam variance from blocking an unrelated UI change.

A useful intuition for tolerance: treat the per-snapshot diff budget as a fraction of total pixels, and size it to the noise floor you measured by diffing two captures of the same commit on the same runner:

threshold = \frac{\sum _{i = 1}^{N} [ ∣ c _{i} - b _{i} ∣ > τ ]}{N} > noise floor

where $c_{i}$ and $b_{i}$ are candidate and baseline pixel values, $τ$ is the per-channel colour-distance tolerance, $[\cdot]$ is the Iverson bracket, and $N$ is the masked pixel count. If your chosen threshold sits below the measured noise floor, the suite will be flaky no matter which vendor renders it.

Threshold and Parameter Reference

The two tools name their knobs differently; this table maps the values that matter for map snapshots so you can port a configuration from one to the other.

Parameter	Percy	Chromatic	Map-specific guidance
Capture surface	Full-page bitmap	Per-story component	Use Percy for composed routes, Chromatic for isolated widgets
Diff sensitivity	Snapshot sensitivity / `percyCSS` masks	`diffThreshold` (≈ `0.063` default)	Loosen to ~`0.1` for WebGL vector tiles; never below the runner noise floor
Anti-aliasing	Handled via masking	Built-in AA handling	Always ignore AA pixels on tile seams
Dynamic-element masking	`percyCSS` `visibility: hidden`	`.sb` ignore / `data-chromatic="ignore"`	Mask attribution, scale bar, cursors, tooltips
DPR / `deviceScaleFactor`	Pin in test context	Pin in test context	Fix at `1` (or one value per matrix)
Idle settle	Your harness handshake	`delay` + `play` handshake	`idle` + `requestAnimationFrame`, fail-closed at `10s`
Cross-browser	Cloud Chrome/Firefox/Safari (tier)	Chromium + opt-in browsers	One baseline per engine, never canonical
Cost driver	Snapshot volume × widths	Per-snapshot count	Snapshot only on map-file changes

Strategic Selection Criteria

Choose Chromatic when your map UI is built as Storybook components — reusable widgets, legend panels, custom controls. Its per-story capture and design-system review UI align naturally with that workflow, and isolating a widget makes the idle handshake easier to express in a play function. Choose Percy when you are validating composed integration views: routing, dynamic layer toggles, and multi-source overlays that only exist once the full page is assembled. Its CLI-first design slots into custom Playwright/Cypress runners outside Storybook and parallelizes well across large monorepos.

Cost shapes the decision as much as architecture. Chromatic bills per snapshot, so deterministic capture and baseline hygiene are budget controls, not just quality controls — every flaky re-snapshot is billable. Percy scales with snapshot volume and concurrency across responsive widths. For heavy GIS workloads, deduplicate snapshots, run visual tests on pull requests only when map-related files change, and archive stale baselines on a schedule; the full breakdown lives in cost analysis of cloud visual testing for mapping apps. For renderer-specific guidance — Leaflet’s DOM-and-raster model versus Mapbox’s WebGL pipeline — work through how to choose visual regression tools for Leaflet vs Mapbox before committing snapshot strategy.

Common Pitfalls

Capturing before the map settles. Root cause: the snapshot call fired on page load rather than the engine idle event, so tiles or labels were still resolving. Fix: gate every capture on the idle + requestAnimationFrame handshake with a fail-closed timeout — this is vendor-independent.
Unmasked attribution and controls. Root cause: Mapbox/Leaflet attribution text, scale bars, and zoom buttons re-flow or animate between runs. Fix: mask them with percyCSS (Percy) or data-chromatic="ignore" (Chromatic) so chrome never enters the diff.
One baseline shared across browsers. Root cause: expecting a Chromium capture to match a WebKit or Firefox render pixel-for-pixel. Fix: maintain a separate baseline per engine and route diffs only within the same engine.
Tolerance set below the noise floor. Root cause: chasing zero-pixel equality on WebGL vector tiles where shader compilation shifts edges by 1–2px. Fix: measure the same-commit, same-runner noise floor first, then set the threshold above it.
Billing surprise from flaky re-snapshots. Root cause: non-deterministic captures forcing repeated runs on a per-snapshot plan (acute on Chromatic). Fix: stabilize capture first on an open pipeline, then run cloud snapshots only when map files change.

Frequently Asked Questions

Does Percy or Chromatic re-render my map in the cloud?

Neither does. Both capture a rendered pixel buffer in a headless browser inside (or alongside) your test run and upload that bitmap for storage and diffing. They do not re-render your HTML server-side from a serialized DOM, which means a WebGL canvas is tested exactly as the capturing browser rasterized it — so your CI environment, not the vendor, determines render fidelity.

Which tool is better for a map application?

It depends on how the UI is built, not on map features per se. Chromatic fits map widgets developed as isolated Storybook stories and teams that want a design-system review surface. Percy fits full-page integration views with multi-source overlays and custom (non-Storybook) test runners. Both require the same deterministic capture work before they are trustworthy.

How do I stop anti-aliasing on labels from causing false positives?

Force a software GL backend such as SwiftShader, bundle a fixed font stack in the container image, pin DPR, and configure the tool to ignore anti-aliased pixels. Then loosen the diff threshold for WebGL vector tiles to roughly a tenth of a percent above the measured same-runner noise floor rather than demanding zero-pixel equality.

How do I keep cloud snapshot costs under control for GIS workloads?

Run visual tests on pull requests only when map-related files change, deduplicate near-identical snapshots, restrict responsive widths to the ones you actually ship, and archive stale baselines on a schedule. On a per-snapshot plan like Chromatic’s, fixing capture flakiness is itself a cost control because every re-snapshot is billable.

Web Map Visual Testing Fundamentals & Toolchains — the parent reference for determinism, baselines, and CI gating.
How to choose visual regression tools for Leaflet vs Mapbox — renderer-specific snapshot strategy under this comparison.
Open-Source Visual Testing Stacks — build and validate capture determinism before paying per snapshot.
Cost analysis of cloud visual testing for mapping apps — the full pricing and snapshot-volume breakdown.
Diff Algorithm Tuning for Cartography — SSIM, pHash, and masking beyond the vendor knobs.
Screenshot Capture, Sync & Comparison Logic — the idle synchronization contract upstream of any vendor.

Up one level: Web Map Visual Testing Fundamentals & Toolchains.

Percy vs Chromatic for Maps

What Percy and Chromatic Actually Do With a Map Frame #

Deterministic Capture: The Work Neither Vendor Does For You #

Step-by-Step: Wiring a Deterministic Snapshot Into Each Tool #

CI/CD Integration and Environment Parity #

Cross-Browser and Cross-Environment Considerations #

Diff Tuning and Cartographic Baseline Curation #

Threshold and Parameter Reference #

Strategic Selection Criteria #

Common Pitfalls #

Frequently Asked Questions #

Related #