Passmark: AI-Driven Browser Regression Testing with Playwright

Passmark is an open-source Playwright library that uses AI to drive natural-language browser regression tests with intelligent caching, auto-healing, and multi-model assertions. It’s designed to make end-to-end browser testing more robust and lower maintenance by letting AI translate human-readable steps into Playwright actions, while caching successful actions and running consensus checks across models for assertions.

Why use Passmark

Write tests as natural-language steps (less brittle than hand-authored selectors).
Multi-model assertion engine (Claude + Gemini, with an arbiter) reduces false positives/negatives.
Redis-backed step caching makes repeat runs fast and cheap by reusing prior actions.
Auto-healing and retry logic mean fewer flaky tests that require manual fixes.
Optional video assertions for transient UI that screenshots miss.
Integrates with Playwright workflows and CI.

Quick start (summary)

Create a Playwright project (TypeScript):

npm init playwright@latest passmark-project
cd passmark-project
npm install passmark dotenv

Add provider keys in .env (you typically need Anthropic + Google for multi-model consensus):
```
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENERATIVE_AI_API_KEY=AIza...
```
Optionally use an AI gateway (Vercel/OpenRouter/Cloudflare) via AIGATEWAY_API_KEY / OPENROUTER_API_KEY / CLOUDFLARE* vars.

Make Playwright read .env:

import dotenv from "dotenv";
dotenv.config({ path: path.resolve(__dirname, ".env") });

Write a test using runSteps:

import { test, expect } from "@playwright/test";
import { runSteps } from "passmark";

test("Shopping cart", async ({ page }) => {
  test.setTimeout(60_000);
  await runSteps({
    page,
    userFlow: "Add product to cart",
    steps: [
      { description: "Navigate to https://demo.vercel.store" },
      { description: "Click Acme Circles T-Shirt" },
      { description: "Select color", data: { value: "White" } },
      { description: "Select size", data: { value: "S" } },
      { description: "Add to cart", waitUntil: "My Cart is visible" },
    ],
    assertions: [
      { assertion: "You can see My Cart with Acme Circles T-Shirt" },
    ],
    test,
    expect,
  });
});

Core concepts and features

runSteps / runUserFlow: Convert natural-language steps into Playwright actions. runSteps supports per-step caching and hybrid AI configuration.
Multi-model assertion engine: Runs assertions across two primary models and uses an arbiter if they disagree, improving reliability of pass/fail decisions.
Redis step caching: Successful single-step actions are cached (keyed by userFlow + step.description). Cached steps can run without AI calls; bypassable via bypassCache.
Auto-healing: If cached steps fail, Passmark falls back to AI execution and may update the cache if the run succeeds.
Video assertions: Record a step run and evaluate assertions against the full video (useful for ephemeral UI). Requires a Google API key for uploading to Gemini Files API.
CUA mode: An alternate visual/coordinate-driven mode that uses OpenAI’s computer-use agent (gpt-5.5 + Responses API computer tool). CUA requires direct OpenAI access (no gateway).
AI gateway support: Optional routing through Vercel/OpenRouter/Cloudflare to centralize keys, caching, and observability.
Placeholders: Dynamic value substitution with {{run.*}}, {{global.*}}, {{data.*}}, and {{email.*}} to support data-driven tests and email extraction.
Secure script runner: AST-validated execution and a whitelisted API surface for generated Playwright scripts.
Telemetry & logging: Optional Axiom/OpenTelemetry tracing and Pino-based logs.

Practical workflows and best practices

Environment and reproducibility

Run tests with elevated privileges if required by your environment (some operations need admin/root).
Pin and document OS, browser, driver, and major dependency versions. Capture the Playwright project and Passmark version.
For repeatable performance or timing-sensitive tests, document or lock CPU/GPU boost behaviors, viewport size, and test machine state.
Use .env and project configuration to keep secrets and API keys out of code. Prefer gateway routing in shared CI environments if you don’t want to manage multiple provider keys.

Test design

Write small, focused flows for caching effectiveness — single-step actions cache best.
Use explicit assertions that map to visible, verifiable outcomes; opt into video assertions for transient UI like toasts.
Use placeholders for test data (randomized emails, unique IDs) to avoid cross-run conflicts.
Use bypassCache: true for steps you want to re-evaluate constantly (e.g., when UI has changed deliberately).
Add retries for flaky network-dependent steps, and prefer AI-evaluated wait conditions (smart backoff).

CI integration

Install and run Playwright + Passmark as part of CI pipelines. Increase test timeouts since AI-driven execution can be slower.
Configure Redis (REDIS_URL) in CI to enable caching (improves speed and reduces AI call cost).
Use environment-specific gateway keys (Vercel/OpenRouter/Cloudflare) or inject provider keys via CI secrets.
Save Playwright reports (npx playwright show-report) and Passmark logs as artifacts. If you opt into video assertions, capture recordings for debugging.

Model and gateway configuration

Configure once via configure({ ai: {...}, uploadBasePath, telemetry, ... }) at process startup.
Choose gateway vs direct provider access based on security, cost, and observability needs. Cloudflare operates as a proxy (you still need your provider keys), while Vercel/OpenRouter act as resellers/gateways.
For CUA mode (visual automation), set ai.mode = "cua" and provide OPENAI_API_KEY; note CUA is incompatible with gateway routing.

Caching and cost optimization

Use Redis to enable step caching. Cache hits bypass AI and reduce latency/cost.
Cache keys: userFlow + step.description. Design step descriptions intentionally to maximize cache reuse.
Bypass cache for exploratory runs, newly-introduced steps, or when auto-healing is required.

Debugging and observability

Enable PASSMARK_LOG_LEVEL=debug to surface AI inputs, decisions, and Playwright actions.
Use telemetry (Axiom/openTelemetry) for traces across AI calls and Playwright runs.
When an assertion fails, review multi-model outputs and arbiter logs. Video assertions provide richer evidence for transient UI.

Security, costs, and limitations

Passmark relies on commercial LLMs (Claude, Gemini, OpenAI). Budget for API usage, especially for initial AI-driven executions before caching warms up.
Video assertions and model choices may require specific provider keys (Google key for Gemini Files API).
Be mindful of sensitive data in page snapshots that are sent to AI providers; redact or avoid exposing secrets in tests.
The project is early and evolving — check the repo for updates, issues, and contributions.

Example test plan (new build smoke + regression)

Smoke run (first boot): runSteps to validate key flows (login, core pages). Use bypassCache: true for first-run reliability.
Burn-in regression: enable Redis and allow caching; run daily to catch regressions; escalate on assertion disagreements.
Exploratory validations: runUserFlow with effort: "high" to let the AI explore flows and surface edge-case failures.
Post-failure workflow: collect Playwright report, Passmark logs, and video (if enabled); re-run specific failing steps with bypassCache to reproduce.

Useful API snippets

Configure globally

import { configure } from "passmark";

configure({
  ai: {
    gateway: "vercel", // or 'openrouter', 'cloudflare', 'none'
    models: {
      stepExecution: "google/gemini-3-flash",
      assertionPrimary: "anthropic/claude-4.5-haiku",
      assertionSecondary: "google/gemini-3-flash",
      assertionArbiter: "google/gemini-3.1-pro-preview",
    },
  },
  uploadBasePath: "./uploads",
});

Per-step CUA override (hybrid runs)

await runSteps({
  page,
  test,
  expect,
  userFlow: "Buy product on sale",
  steps: [
    { description: "Navigate to /products" },
    {
      description: "Drag the price slider to $40",
      ai: { mode: "cua", gateway: "none" }, // visual/coordinate mode for this step
    },
    { description: "Click Add to cart" },
  ],
});

Where to look next

Repository: https://github.com/bug0inc/passmark — read README, CHANGELOG, and issues for latest features and breaking changes.
Docs at the project homepage (passmark.dev) — examples, configuration, and CI patterns.
Contribute: tests, edge-case coverage, telemetry integrations, and email providers are cited as good contribution areas.

License and community

The project uses a Functional Source License (FSL-1.1-Apache-2.0 hybrid); check the repo LICENSE.md for details and contribution terms.
Active contributors and growing community — review open issues and PRs to understand known limitations and roadmap.

Conclusion Passmark aims to reduce test maintenance and improve reliability by combining AI-driven step execution with caching and multi-model assertions. It’s a good fit when you want natural-language-driven tests, intelligent retries, and lower churn on locator-based failures — but budget and privacy considerations for AI calls must be part of your decision. Start with a small core of critical flows, enable Redis caching, and iterate configuration (models, gateway) as you learn the cost and stability profile in your environment.

Passmark: AI-Driven Browser Regression Testing with Playwright

Comments

More from this blog

Next.js: key features and practical examples

Features of the Node.js in practice: what modern Node.js offers, why you’d use it, and practical examples

You probably don't need useEffect for that

Mastering useEffect in React: a practical guide

Command Palette

Comments

More from this blog