Visual Regression Testing with LLM-Powered Analysis

Traditional pixel-diff tools are great at telling you that something changed. But they're terrible at telling you what changed and whether it matters. A single-pixel shift from a font rendering update triggers the same alert as a completely broken layout.

We built visualqe to solve this. It combines traditional screenshot comparison with LLM-powered semantic analysis, so you get diffs that actually mean something.

The Problem with Pixel Diffs

If you've worked in QE, you've seen this pattern:

Set up visual regression tests with a pixel-diff tool
Get flooded with false positives from font rendering, anti-aliasing, or timing differences
Raise the threshold to reduce noise
Miss actual bugs because the threshold is too high
Abandon visual testing or spend hours manually reviewing diffs

The fundamental issue is that pixel comparison treats all changes equally. A 2% diff from sub-pixel font rendering and a 2% diff from a broken CSS rule look the same to the algorithm.

How visualqe Works

visualqe takes a different approach. It still captures screenshots and computes pixel diffs, but then it sends the before/after images to a vision-language model (Google Gemini) that can actually interpret what changed.

Capture Screenshots

Uses the Pixcap API to capture pixel-perfect screenshots of your pages

Compute Structural Diff

SSIM algorithm identifies regions with visual differences

Semantic Analysis

LLM examines the diff and describes what actually changed in plain English

Severity Classification

Changes are classified as critical, warning, or informational based on impact

The result is a report that tells you "The primary CTA button changed from blue to green" instead of "47 pixels differ in region (234, 567)."

Getting Started

Installation

pip install visualqe

You'll need two API keys:

PIXCAP_API_KEY - Get this from your Pixcap dashboard
GEMINI_API_KEY - Get this from Google AI Studio (free tier available)

Basic Usage

from visualqe import VisualQE

# Initialize with your API keys
vqe = VisualQE(
    pixcap_api_key="pix_your_key",
    gemini_api_key="your_gemini_key"
)

# Capture a baseline screenshot
screenshot = vqe.capture("https://your-app.com/dashboard")
vqe.save_baseline("dashboard", screenshot)

# Later, compare against the baseline
new_screenshot = vqe.capture("https://your-app.com/dashboard")
result = vqe.compare("dashboard", new_screenshot)

print(result.summary)
# "The navigation bar now includes a 'Settings' link.
#  The user avatar has moved from the left to the right side.
#  No functional elements appear broken."

Intent Validation

One of the most powerful features is intent validation. When you're making intentional changes, you can tell visualqe what to expect:

result = vqe.compare(
    "checkout-page",
    new_screenshot,
    intent="Added a 'Save for later' button below the cart items"
)

if result.intent_validated:
    print("Change implemented correctly")
else:
    print(f"Issue: {result.intent_feedback}")
    # "The 'Save for later' button is present but appears
    #  to be disabled/grayed out, which may not match the intent."

pytest Integration

visualqe includes a pytest plugin that makes it easy to add visual tests to your existing test suite:

# tests/visual/test_pages.py

def test_homepage(visual_check):
    visual_check("homepage", "https://your-app.com/")

def test_login_page(visual_check):
    visual_check("login", "https://your-app.com/login")

def test_dashboard_redesign(visual_check):
    visual_check(
        "dashboard",
        "https://your-app.com/dashboard",
        intent="Header should show new company logo"
    )

Run with:

pytest tests/visual/ --visual-report=./reports/visual.html

This generates an HTML report with side-by-side comparisons and AI-generated summaries for each diff.

Configuration Options

Option	Description	Default
`--visual-baseline-dir`	Where to store baseline images	`./baselines`
`--visual-threshold`	Pixel diff sensitivity (0-1)	`0.01`
`--visual-update-baselines`	Update baselines instead of comparing	`false`
`--visual-skip-analysis`	Skip LLM analysis (faster, less context)	`false`
`--visual-branch`	Organize baselines by git branch	`main`

Testing Internal Apps with VPN Connector

Need to test staging environments or internal tools that aren't publicly accessible? visualqe works seamlessly with Pixcap's VPN Connector.

# Run the Pixcap connector in your network
docker run -d \
  -e PIXCAP_CONNECTOR_TOKEN=pxc_your_token \
  -e PIXCAP_CONNECTOR_NAME=staging \
  pixcap/connector

Then test internal URLs just like public ones:

# This works for internal URLs when connector is running
vqe.capture("http://staging.internal.company.com/admin")
vqe.capture("http://localhost:3000/dashboard")

The connector creates a secure tunnel from your network to Pixcap, so screenshots are captured from inside your firewall without exposing any ports.

GitHub Action for CI/CD

For automated visual testing in your CI/CD pipeline, we provide a GitHub Action that handles everything:

# .github/workflows/visual-test.yml
name: Visual Regression Tests

on:
  pull_request:
    branches: [main]

jobs:
  visual-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start app
        run: |
          npm install
          npm run build
          npm start &

      - name: Run visual tests
        uses: dan-shah/visual-regression-action@v1
        with:
          api_key: ${{ secrets.PIXCAP_API_KEY }}
          gemini_api_key: ${{ secrets.GEMINI_API_KEY }}
          base_url: http://localhost:3000
          routes: '["/", "/about", "/pricing", "/docs"]'
          diff_mode: semantic
          fail_on_diff: true

Action Features

Automatic baseline management - Baselines are stored per branch, so PRs compare against their target branch
Semantic or pixel mode - Choose AI-powered analysis or strict pixel comparison
Sitemap support - Test all pages by pointing to your sitemap.xml
Wait conditions - Polls a URL to ensure your app is ready before testing
PR comments - Automatically posts diff summaries to your pull request

Pro tip: Use diff_mode: semantic for feature branches where UI changes are expected, and diff_mode: pixel with a low threshold for hotfix branches where no visual changes should occur.

Testing Internal Apps in CI

The GitHub Action can also test apps that aren't publicly accessible by automatically setting up a connector tunnel:

- name: Run visual tests on staging
  uses: dan-shah/visual-regression-action@v1
  with:
    api_key: ${{ secrets.PIXCAP_API_KEY }}
    gemini_api_key: ${{ secrets.GEMINI_API_KEY }}
    base_url: http://localhost:3000
    connector_token: ${{ secrets.PIXCAP_CONNECTOR_TOKEN }}
    routes: '["/", "/admin", "/settings"]'

CLI for Local Development

visualqe also includes a CLI for quick checks during development:

# Capture a screenshot
visualqe capture https://localhost:3000 -o homepage.png

# Compare against a baseline
visualqe compare homepage https://localhost:3000 --report diff.html

# List all baselines
visualqe list --baseline-dir ./baselines

# Estimate costs for a test run
visualqe estimate 50
# Estimated cost for 50 comparisons: $0.60 - $1.25

# Check service health
visualqe health

Cost Considerations

visualqe uses two paid services:

Pixcap screenshots: $0.003-0.005 per capture depending on your plan
Gemini analysis: ~$0.002-0.005 per comparison (depends on image size)

For a typical test suite of 50 pages run on each PR, expect roughly $0.50-1.00 per run. The visualqe estimate command can give you a more precise projection.

If you want to reduce costs, you can:

Use --visual-skip-analysis to disable LLM analysis for routine checks
Only run full visual tests on PRs that touch frontend code
Use pixel-only mode for branches where semantic analysis isn't needed

Real-World Example

Here's how a typical QE workflow might look:

Developer opens a PR with CSS changes to the checkout flow
GitHub Action runs visual tests against the PR branch
visualqe captures screenshots of all checkout pages
LLM analyzes the diffs and identifies: "The 'Complete Purchase' button has increased padding and the order summary card has a new drop shadow"
Action posts a comment on the PR with the summary and links to the full report
QE reviews the semantic summary instead of manually inspecting pixel diffs
PR merges with confidence that visual changes are intentional

Getting Started Today

Ready to add intelligent visual testing to your workflow?

Install the package: pip install visualqe
Get your Pixcap API key (100 free credits included)
Get a Gemini API key (free tier available)
Run your first comparison

Check out the full documentation on PyPI for more examples and advanced configuration options.

Start visual testing with AI

Get 100 free Pixcap credits to try visualqe with your app.

Get Started Free