Blog / January 23, 2025

Visual Regression Testing with LLM-Powered Analysis

How quality engineers can use the visualqe Python package to catch visual bugs before they reach production, with AI that actually understands what changed.

Traditional pixel-diff tools are great at telling you that something changed. But they're terrible at telling you what changed and whether it matters. A single-pixel shift from a font rendering update triggers the same alert as a completely broken layout.

We built visualqe to solve this. It combines traditional screenshot comparison with LLM-powered semantic analysis, so you get diffs that actually mean something.

The Problem with Pixel Diffs

If you've worked in QE, you've seen this pattern:

  1. Set up visual regression tests with a pixel-diff tool
  2. Get flooded with false positives from font rendering, anti-aliasing, or timing differences
  3. Raise the threshold to reduce noise
  4. Miss actual bugs because the threshold is too high
  5. Abandon visual testing or spend hours manually reviewing diffs

The fundamental issue is that pixel comparison treats all changes equally. A 2% diff from sub-pixel font rendering and a 2% diff from a broken CSS rule look the same to the algorithm.

How visualqe Works

visualqe takes a different approach. It still captures screenshots and computes pixel diffs, but then it sends the before/after images to a vision-language model (Google Gemini) that can actually interpret what changed.

1

Capture Screenshots

Uses the Pixcap API to capture pixel-perfect screenshots of your pages

2

Compute Structural Diff

SSIM algorithm identifies regions with visual differences

3

Semantic Analysis

LLM examines the diff and describes what actually changed in plain English

4

Severity Classification

Changes are classified as critical, warning, or informational based on impact

The result is a report that tells you "The primary CTA button changed from blue to green" instead of "47 pixels differ in region (234, 567)."

Getting Started

Installation

pip install visualqe

You'll need two API keys:

Basic Usage

from visualqe import VisualQE

# Initialize with your API keys
vqe = VisualQE(
    pixcap_api_key="pix_your_key",
    gemini_api_key="your_gemini_key"
)

# Capture a baseline screenshot
screenshot = vqe.capture("https://your-app.com/dashboard")
vqe.save_baseline("dashboard", screenshot)

# Later, compare against the baseline
new_screenshot = vqe.capture("https://your-app.com/dashboard")
result = vqe.compare("dashboard", new_screenshot)

print(result.summary)
# "The navigation bar now includes a 'Settings' link.
#  The user avatar has moved from the left to the right side.
#  No functional elements appear broken."

Intent Validation

One of the most powerful features is intent validation. When you're making intentional changes, you can tell visualqe what to expect:

result = vqe.compare(
    "checkout-page",
    new_screenshot,
    intent="Added a 'Save for later' button below the cart items"
)

if result.intent_validated:
    print("Change implemented correctly")
else:
    print(f"Issue: {result.intent_feedback}")
    # "The 'Save for later' button is present but appears
    #  to be disabled/grayed out, which may not match the intent."

pytest Integration

visualqe includes a pytest plugin that makes it easy to add visual tests to your existing test suite:

# tests/visual/test_pages.py

def test_homepage(visual_check):
    visual_check("homepage", "https://your-app.com/")

def test_login_page(visual_check):
    visual_check("login", "https://your-app.com/login")

def test_dashboard_redesign(visual_check):
    visual_check(
        "dashboard",
        "https://your-app.com/dashboard",
        intent="Header should show new company logo"
    )

Run with:

pytest tests/visual/ --visual-report=./reports/visual.html

This generates an HTML report with side-by-side comparisons and AI-generated summaries for each diff.

Configuration Options

Option Description Default
--visual-baseline-dir Where to store baseline images ./baselines
--visual-threshold Pixel diff sensitivity (0-1) 0.01
--visual-update-baselines Update baselines instead of comparing false
--visual-skip-analysis Skip LLM analysis (faster, less context) false
--visual-branch Organize baselines by git branch main

Testing Internal Apps with VPN Connector

Need to test staging environments or internal tools that aren't publicly accessible? visualqe works seamlessly with Pixcap's VPN Connector.

# Run the Pixcap connector in your network
docker run -d \
  -e PIXCAP_CONNECTOR_TOKEN=pxc_your_token \
  -e PIXCAP_CONNECTOR_NAME=staging \
  pixcap/connector

Then test internal URLs just like public ones:

# This works for internal URLs when connector is running
vqe.capture("http://staging.internal.company.com/admin")
vqe.capture("http://localhost:3000/dashboard")

The connector creates a secure tunnel from your network to Pixcap, so screenshots are captured from inside your firewall without exposing any ports.

GitHub Action for CI/CD

For automated visual testing in your CI/CD pipeline, we provide a GitHub Action that handles everything:

# .github/workflows/visual-test.yml
name: Visual Regression Tests

on:
  pull_request:
    branches: [main]

jobs:
  visual-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Start app
        run: |
          npm install
          npm run build
          npm start &

      - name: Run visual tests
        uses: dan-shah/visual-regression-action@v1
        with:
          api_key: ${{ secrets.PIXCAP_API_KEY }}
          gemini_api_key: ${{ secrets.GEMINI_API_KEY }}
          base_url: http://localhost:3000
          routes: '["/", "/about", "/pricing", "/docs"]'
          diff_mode: semantic
          fail_on_diff: true

Action Features

Pro tip: Use diff_mode: semantic for feature branches where UI changes are expected, and diff_mode: pixel with a low threshold for hotfix branches where no visual changes should occur.

Testing Internal Apps in CI

The GitHub Action can also test apps that aren't publicly accessible by automatically setting up a connector tunnel:

- name: Run visual tests on staging
  uses: dan-shah/visual-regression-action@v1
  with:
    api_key: ${{ secrets.PIXCAP_API_KEY }}
    gemini_api_key: ${{ secrets.GEMINI_API_KEY }}
    base_url: http://localhost:3000
    connector_token: ${{ secrets.PIXCAP_CONNECTOR_TOKEN }}
    routes: '["/", "/admin", "/settings"]'

CLI for Local Development

visualqe also includes a CLI for quick checks during development:

# Capture a screenshot
visualqe capture https://localhost:3000 -o homepage.png

# Compare against a baseline
visualqe compare homepage https://localhost:3000 --report diff.html

# List all baselines
visualqe list --baseline-dir ./baselines

# Estimate costs for a test run
visualqe estimate 50
# Estimated cost for 50 comparisons: $0.60 - $1.25

# Check service health
visualqe health

Cost Considerations

visualqe uses two paid services:

For a typical test suite of 50 pages run on each PR, expect roughly $0.50-1.00 per run. The visualqe estimate command can give you a more precise projection.

If you want to reduce costs, you can:

Real-World Example

Here's how a typical QE workflow might look:

  1. Developer opens a PR with CSS changes to the checkout flow
  2. GitHub Action runs visual tests against the PR branch
  3. visualqe captures screenshots of all checkout pages
  4. LLM analyzes the diffs and identifies: "The 'Complete Purchase' button has increased padding and the order summary card has a new drop shadow"
  5. Action posts a comment on the PR with the summary and links to the full report
  6. QE reviews the semantic summary instead of manually inspecting pixel diffs
  7. PR merges with confidence that visual changes are intentional

Getting Started Today

Ready to add intelligent visual testing to your workflow?

  1. Install the package: pip install visualqe
  2. Get your Pixcap API key (100 free credits included)
  3. Get a Gemini API key (free tier available)
  4. Run your first comparison

Check out the full documentation on PyPI for more examples and advanced configuration options.

Start visual testing with AI

Get 100 free Pixcap credits to try visualqe with your app.

Get Started Free