obfus.link
Validators

Regex from examples + cross-flavor translation across JS, Python, Go, Rust, PCRE

Verify a regex against test cases with anti-pattern detection, generate a regex from positive and negative examples, or translate between five regex dialects with per-feature warnings for incompatibilities.

Regex Verifier runs in three modes — verify (run test cases with ReDoS anti-pattern detection), generate (produce a regex from positive and negative examples without writing syntax), and translate (convert between JavaScript, Python, Go, Rust, and PCRE dialects with per-feature compatibility warnings). Explain mode decomposes any valid regex into a plain-English description.

1. Insight

Insight

The problem this article addresses and why it matters.

Regex is the wrong syntax for the right idea

Almost every developer has the same regex relationship: they know what they want to match, they don't remember the syntax to write it, they Google a snippet, paste it in, and run their tests. Sometimes the tests pass. Sometimes they pass and the regex catastrophically backtracks on a real input three weeks later — a class of bug known as ReDoS, documented extensively by OWASP and behind several published CVEs (notably the 2019 Cloudflare outage where a single regex took down a third of the internet).

The other half of the problem is portability. A regex written for JavaScript may rely on lookbehind assertions that Python 3.6 doesn't support, named capture groups that Go's regexp package emits with different syntax, or Unicode property escapes (\p{L}) that aren't available in older PCRE versions. Cross-language teams maintain three copies of the same regex with subtle drift.

Why a verifier beats writing the regex first

The traditional workflow is regex-first: you write the pattern, then test it against examples. Inverted: write the examples first, ask a tool to produce the regex. The examples are the source of truth — you wanted "match e-mail addresses but not the test inputs that look like e-mail addresses and aren't" — and the regex is an implementation detail. The verifier supports both directions: validate a regex against test cases (the existing workflow, with anti-pattern detection bolted on), or generate the regex from examples (the inverted workflow).

The third mode handles the portability problem: paste a regex from one language and ask for the equivalent in another. The tool flags features that don't translate cleanly with per-feature warnings, so the cross-team divergence problem becomes a one-call fix.

What this article delivers

Three workflows walked end-to-end: verifying a pattern against test cases (with anti-pattern detection), generating a pattern from positive and negative examples, and translating a pattern across the five major regex dialects (JavaScript, Python, Go, Rust, PCRE). We cover the explain mode that decomposes any regex into plain English and the anti-pattern detector that flags ReDoS-prone constructs.

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

  • Verify a regex against an array of test cases and read a per-case pass/fail report with extracted capture groups
  • Generate a regex from positive and negative examples using one of three strategies (precise, balanced, permissive)
  • Translate a regex between JavaScript, Python, Go, Rust, and PCRE dialects with per-feature warnings for incompatibilities
  • Read the Explain mode output that decomposes any regex into a plain-English description
  • Identify ReDoS-prone patterns (catastrophic backtracking, over-broad wildcards, missing anchors) with severity ratings and one-line fixes

The Examples section walks through each of the three modes against the same real-world problem — matching internal API endpoints across three different services.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: generate mode

You want a regex that matches your team's internal API endpoints. You have examples that should match and examples that shouldn't:

Before: open a regex tester, try ^/api/v\d+/.+, realise it matches /api/v1/, add a non-empty group, realise it matches the deprecated /api/v0/... paths you wanted excluded, end up with ^/api/v[12]/.+ and pray you remember to add v3 next quarter.

After:

regexVerifier({
  mode: 'generate',
  generatorStrategy: 'balanced',
  testCases: [
    { input: '/api/v1/users',            shouldMatch: true  },
    { input: '/api/v2/orgs/123',         shouldMatch: true  },
    { input: '/api/v2/orgs/123/members', shouldMatch: true  },
    { input: '/api/v0/legacy',           shouldMatch: false },
    { input: '/internal/health',         shouldMatch: false },
    { input: '/api/v1',                  shouldMatch: false },
  ],
});

// generatedPattern: '^/api/v[1-9]\\d*/.+'
// generatedFlags:   ''
// valid: true
// results: [
//   { input: '/api/v1/users',            matched: true,  expected: true,  passed: true },
//   { input: '/api/v2/orgs/123',         matched: true,  expected: true,  passed: true },
//   ...
// ]

The [1-9]\d* after v is the generator earning its keep — accepts v1 through v999 but rejects v0. With generatorStrategy: 'precise' you'd get ^/api/v[12]/.+ (matches only the exact examples). With 'permissive' you'd get ^/api/v\d+/.+ (matches v0 too — fails the explicit negative case). 'balanced' finds the middle.

Before / after: cross-flavor translation

Same regex in five languages:

Before:

// JavaScript
const r = /(?<=user_)\d+/;
# Python
import re
r = re.compile(r"(?<=user_)\d+")
// Go
r := regexp.MustCompile(`(?<=user_)\d+`)
// → panic: regexp: Compile(`(?<=user_)\d+`): error parsing regexp: invalid or unsupported Perl syntax

Go's regexp (RE2) doesn't support lookbehind. You either rewrite to use a non-lookbehind alternative (capture group + extract index 1) or ship a different regex per language.

After:

regexVerifier({
  mode: 'translate',
  pattern: '(?<=user_)\\d+',
  flags: '',
  sourceDialect: 'javascript',
  targetDialect: 'go',
  testCases: [],
});

// translation: {
//   sourceDialect: 'javascript',
//   targetDialect: 'go',
//   translatedPattern: 'user_(\\d+)',
//   translatedFlags: '',
//   warnings: [
//     {
//       feature: 'lookbehind',
//       message: 'Go RE2 does not support lookbehind. Rewrote as capture group; extract via match[1] instead of match[0].',
//     },
//   ],
// }

The translator picks the closest semantic equivalent and surfaces the API contract change (extract match[1] instead of match[0]) as a warning the developer reads before pasting the new regex.

Before / after: anti-pattern detection

regexVerifier({
  mode: 'verify',
  pattern: '^(a+)+$',
  flags: '',
  testCases: [{ input: 'aaaaaaaaaaaaaaaaaaaaab', shouldMatch: false }],
  detectAntiPattern: true,
});

// valid: true
// results: [{ input: 'aaaaaaaaaaaaaaaaaaaaab', matched: false, expected: false, passed: true }]
// antiPatterns: [
//   {
//     pattern: '(a+)+',
//     severity: 'critical',
//     description: 'Nested quantifier causes catastrophic backtracking on non-matching input',
//     fix: 'Rewrite without nested quantifier: ^a+$ matches the same set without backtracking',
//   },
// ]

The test passes (the regex correctly returns no match), but the input took milliseconds when it should take microseconds. The anti-pattern detector flags (a+)+ as a classic ReDoS construct — the kind of thing that brings down production when an attacker submits a 50-character string.

When humans use this

The most common workflow on the web UI is iterative: paste a regex, paste a few test inputs, click run, read the pass/fail and the anti-pattern warnings, refine. Generate mode is the second-most-common — particularly for developers who think in examples ("match these but not those") and don't want to translate their mental model into syntax. Explain mode powers the "what does this regex do?" question that comes up during code review when someone inherits a regex from a previous engineer.

When agents use this

Three patterns that dominate:

  • Code-generation agent producing validation logic. An agent asked to "validate that the input matches our internal email format" struggles when it has to write the regex itself. Generate mode inverts the problem: the agent describes the requirement as positive + negative examples, the tool produces the regex, the agent embeds it. Reliability goes up because the regex is verified against the examples before it ships.
  • Multi-language pipeline agent. An agent generating both backend (Go) and frontend (JavaScript) validation calls the verifier in translate mode to keep a single canonical regex in JavaScript, translates to Go for the API server, and surfaces any warnings as comments in the generated code.
  • Security-audit agent. An agent scanning a codebase for ReDoS-prone patterns calls verify mode with detectAntiPattern: true against every regex literal it finds. Critical findings open a security advisory PR; lower-severity findings open a tech-debt ticket.

Edge cases

Empty test-case arrays

Verify mode with testCases: [] returns valid: true (the regex compiles) but no pass/fail entries. Use this to syntax-check a regex without testing it. Generate mode requires at least one positive AND one negative example — empty arrays return INPUT_EMPTY.

Conflicting test cases

Verify mode tolerates contradictory expectations (one case says shouldMatch=true, another with the same input says false). The output reports both as failures. Generate mode rejects with INPUT_INVALID_TYPE because contradictory examples have no valid regex solution.

Translation incompatibilities the tool can't resolve

Some constructs have no equivalent in the target dialect — variable-length lookbehinds (JavaScript, .NET) translated to Python sub-3.7 or Go RE2, recursive subroutines ((?R) in PCRE), conditional patterns. These return translatedPattern: null with a warning explaining the structural reason and pointing at the closest non-regex alternative (PEG parser, string-manipulation API).

Regex flags across dialects

The flags parameter is passed verbatim in verify mode. In translate mode, flags are mapped to the target dialect's syntax — JavaScript gi becomes Python re.IGNORECASE, Go (?i). Flags that don't translate (JavaScript's s for dotall is implicit in Python 3.4+; Go has no g for global) are surfaced in warnings.

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field

Type

Required

Default

Description

mode

'verify' | 'generate' | 'translate'

Workflow selector

pattern

string

conditional

Required for verify and translate modes

flags

string

''

Regex flags (g, i, m, s, u, y)

testCases

Array<{input, shouldMatch}>

At least one positive + one negative required for generate mode

explainMode

boolean

false

Decompose the regex into plain English

detectAntiPattern

boolean

false

Flag ReDoS-prone constructs with severity ratings

generatorStrategy

'precise' | 'balanced' | 'permissive'

for generate mode

'balanced'

Controls regex generality vs example fit

sourceDialect

'javascript' | 'python' | 'go' | 'rust' | 'pcre'

for translate mode

Input dialect

targetDialect

same enum

for translate mode

Output dialect

Output shape

{
  valid:           boolean;
  results:         Array<{
    input:          string;
    matched:        boolean;
    expected:       boolean;
    passed:         boolean;
    captureGroups?: string[];
  }>;
  explanation?:      string;    // when explainMode: true
  antiPatterns?:     Array<{ pattern, severity, description, fix }>;
  generatedPattern?: string;    // when mode: 'generate'
  generatedFlags?:   string;
  translation?: {                // when mode: 'translate'
    sourceDialect:     string;
    targetDialect:     string;
    translatedPattern: string;
    translatedFlags:   string;
    warnings: Array<{ feature: string; message: string }>;
  };
}

Generator strategies compared

Strategy

Output for [v1, v2] (positive) / [v0] (negative)

When to use

precise

^/api/v[12]/.+

Exact-match validation where future inputs must conform

balanced

^/api/v[1-9]\d*/.+

Default — accepts the obvious extension of the example set

permissive

^/api/v\d+/.+

Broadest match while respecting negative examples

Error codes

Code

When it fires

Recovery

INPUT_EMPTY

Generate mode with empty testCases, or verify mode with empty pattern

Provide the required input

INPUT_MALFORMED

pattern is not a valid regex in sourceDialect

Verify the source dialect matches the pattern syntax

INPUT_INVALID_TYPE

Generate mode contradictory examples; missing positive or negative cases

Provide both shouldMatch:true and shouldMatch:false cases

UNSUPPORTED_FORMAT

Translation between unsupported dialect pair (rare)

Translate via JavaScript as an intermediate dialect

TIMEOUT

Verify mode hit catastrophic backtracking on a test case (3s ceiling)

The regex is ReDoS-vulnerable — run with detectAntiPattern: true

When NOT to use this tool

Don't use the generator as a substitute for thinking about the problem space. The generator extrapolates from examples; if your examples don't cover the edge cases your production input set will include, the generated regex won't either. Always include adversarial negative examples (inputs that look like the positive cases but shouldn't match).

Don't use translate mode as a substitute for re-testing in the target language. The translation is semantic-equivalent for the documented features; subtle behaviour differences (Unicode handling, locale sensitivity, anchor semantics in multiline mode) require validation in the target runtime.

For complex parsing (nested structures, recursive grammars), use a PEG parser (peggy, tree-sitter, chevrotain). Regex is the wrong tool for any input where you'd want to describe the grammar with rules.

Performance notes

Verify mode execution is bounded by a 3-second hard timeout per test case to defend against ReDoS. Generate mode runs an iterative search bounded by 2 seconds; it returns the best result found within that budget. Translate mode is single-pass parse + emit, typically under 5ms. The tool is deterministic for verify and translate modes; generate mode is deterministic per (testCases, generatorStrategy) tuple. REST responses are Edge-Cache eligible for verify and translate modes.

Try it now

Regex Verifier

Verify, generate, explain, and translate regular expressions

FAQ

Frequently asked questions

What happens when the regex can't be translated to the target dialect?

The translator returns the closest semantic equivalent and lists every incompatibility as a warning. Constructs with no equivalent (PCRE recursive subroutines in Go RE2) return translatedPattern: null with an explanation. The warnings often point at the closest non-regex alternative (PEG parser, string-manipulation API).

Can the generator handle complex test cases?

Yes, but the more contradictory the examples, the longer the search runs. Generate mode has a 2-second budget; it returns the best result within that window. Adversarial negative examples (inputs that look positive but shouldn't match) sharpen the generated pattern.

Does explain mode work on any regex?

It works on syntactically valid regex in the supported dialects. The decomposition is best-effort for unusual constructs — explain mode is a teaching tool, not a formal verifier. Combine with verify mode to confirm the regex behaves as the explanation suggests.

What flags does the anti-pattern detector look for?

Catastrophic backtracking (nested quantifiers like (a+)+), over-broad wildcards (.* without anchors), redundant character class escapes, and missing anchors on patterns that should be line-bound. Severity ranges from critical (security implications) to info (style).

How does this compare to regex101.com?

regex101 is interactive and visual; this tool is API-first with deterministic output. Use regex101 for exploration, this tool for CI gates, agent pipelines, and cases where you need the structured output (anti-patterns array, generator output, translation warnings) rather than a UI rendering.