Regular Expressions: A Practical Guide for Developers
RegexTipsJavaScript

Regular Expressions: A Practical Guide for Developers

What Is a Regular Expression?

A regular expression (regex or regexp) is a sequence of characters that defines a pattern for matching, searching, and replacing text. The syntax looks cryptic at first, but once you learn the ~20 core concepts, you can parse and transform text with a precision that loops and conditionals cannot match.

Regex is available in virtually every programming language β€” JavaScript, Python, Go, Rust, Java, PHP, Ruby β€” with mostly consistent syntax. The patterns you learn today will work across your entire stack.

Basic Building Blocks

Literal characters β€” cat matches the exact string "cat" wherever it appears. Case-sensitive by default.

The dot . β€” matches any single character except a newline. c.t matches "cat", "cut", "c4t", but not "ct" or "coat".

Character classes [...] β€” match any one of the characters inside the brackets. [aeiou] matches any vowel. [a-z] matches any lowercase letter. [0-9] matches any digit.

Negated classes [^...] β€” match any character NOT in the brackets. [^0-9] matches any non-digit.

Anchors β€” ^ matches the start of the string; $ matches the end. ^d{5}$ matches a string that is exactly five digits, nothing more.

Quantifiers

Quantifiers specify how many times the preceding element must match:

  • * β€” zero or more times
  • + β€” one or more times
  • ? β€” zero or one time (makes the element optional)
  • {n} β€” exactly n times
  • {n,} β€” at least n times
  • {n,m} β€” between n and m times (inclusive)

By default, quantifiers are greedy β€” they match as much as possible. Add ? after a quantifier to make it lazy (match as little as possible): .*? instead of .*. This matters when parsing HTML or nested structures.

Common Shorthand Classes

These are the most useful shortcuts:

  • d β€” any digit, equivalent to [0-9]
  • D β€” any non-digit, equivalent to [^0-9]
  • w β€” any word character: [a-zA-Z0-9_]
  • W β€” any non-word character
  • s β€” any whitespace character (space, tab, newline)
  • S β€” any non-whitespace character
  •  β€” word boundary (zero-width assertion between w and W)

Groups and Alternation

Groups (...) β€” group part of a pattern and capture the match. (foo|bar) matches "foo" or "bar". The captured group can be referenced in replacements as $1, $2, etc.

Non-capturing groups (?:...) β€” group without capturing. Use when you need grouping for quantifiers but do not need the match value.

Named groups (?<name>...) β€” capture with a name for cleaner replacement code:

const match = '2026-02-19'.match(/(?<year>d{4})-(?<month>d{2})-(?<day>d{2})/);
console.log(match.groups.year);  // '2026'

Lookaheads and Lookbehinds

Zero-width assertions match positions without consuming characters:

  • (?=...) β€” positive lookahead: "followed by"
  • (?!...) β€” negative lookahead: "not followed by"
  • (?<=...) β€” positive lookbehind: "preceded by"
  • (?<!...) β€” negative lookbehind: "not preceded by"
// Match numbers followed by 'px' but don't include 'px' in the match
/d+(?=px)/g

// Match prices not preceded by a currency symbol
/(?<!$)d+.d{2}/g

Flags

Flags modify how the pattern is applied:

  • g β€” global: find all matches, not just the first
  • i β€” case-insensitive
  • m β€” multiline: ^ and $ match line starts and ends, not just string start and end
  • s β€” dotAll: . matches newlines too
  • u β€” Unicode mode: enables full Unicode matching

Practical Examples

// Email validation (simplified β€” true email validation needs the full RFC)
/^[^s@]+@[^s@]+.[^s@]+$/

// International phone number (E.164 format)
/^+[1-9]d{6,14}$/

// Strip HTML tags
/<[^>]*>/g

// Find URLs
/https?://[^s"'<>]+/g

// Extract hashtags
/#[w֐-׿]+/g

// Validate UUID v4
/^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i

// Match repeated words (the the)
/(w+)s+\1/gi

// Validate semantic version
/^d+.d+.d+(?:-[w.]+)?(?:+[w.]+)?$/

Performance Pitfalls: Catastrophic Backtracking

Some regex patterns can hang or crash with certain inputs β€” a vulnerability known as ReDoS (Regular Expression Denial of Service). The classic example: (a+)+ applied to the string "aaaaaaaab". The engine tries exponentially many combinations before failing.

Avoid nested quantifiers on overlapping patterns. Use possessive quantifiers (++) or atomic groups if your engine supports them. When accepting user-supplied regex patterns (never a good idea), use a library like safe-regex to validate the pattern first.

Testing and Iteration

Never write regex from memory for important patterns. Always test against a good set of inputs β€” including edge cases and adversarial inputs. Use PureFormatter's Find & Replace with regex mode enabled to test patterns interactively against real text before deploying them in code.

Fredy
Written by
Fredy
Senior Developer & Technical Writer

Fredy is a full-stack developer with 8+ years of experience building web applications. He writes about developer tools, best practices, and the craft of clean code.

Try these tools

Put this guide into practice

Open these related browser-based tools to apply the ideas from this article.