What Is a Regular Expression?
A regular expression (regex or regexp) is a sequence of characters that defines a pattern for matching, searching, and replacing text. The syntax looks cryptic at first, but once you learn the ~20 core concepts, you can parse and transform text with a precision that loops and conditionals cannot match.
Regex is available in virtually every programming language — JavaScript, Python, Go, Rust, Java, PHP, Ruby — with mostly consistent syntax. The patterns you learn today will work across your entire stack.
Basic Building Blocks
Literal characters — cat matches the exact string "cat" wherever it appears. Case-sensitive by default.
The dot . — matches any single character except a newline. c.t matches "cat", "cut", "c4t", but not "ct" or "coat".
Character classes [...] — match any one of the characters inside the brackets. [aeiou] matches any vowel. [a-z] matches any lowercase letter. [0-9] matches any digit.
Negated classes [^...] — match any character NOT in the brackets. [^0-9] matches any non-digit.
Anchors — ^ matches the start of the string; $ matches the end. ^d{5}$ matches a string that is exactly five digits, nothing more.
Quantifiers
Quantifiers specify how many times the preceding element must match:
*— zero or more times+— one or more times?— zero or one time (makes the element optional){n}— exactly n times{n,}— at least n times{n,m}— between n and m times (inclusive)
By default, quantifiers are greedy — they match as much as possible. Add ? after a quantifier to make it lazy (match as little as possible): .*? instead of .*. This matters when parsing HTML or nested structures.
Common Shorthand Classes
These are the most useful shortcuts:
d— any digit, equivalent to[0-9]D— any non-digit, equivalent to[^0-9]w— any word character:[a-zA-Z0-9_]W— any non-word characters— any whitespace character (space, tab, newline)S— any non-whitespace character— word boundary (zero-width assertion between w and W)
Groups and Alternation
Groups (...) — group part of a pattern and capture the match. (foo|bar) matches "foo" or "bar". The captured group can be referenced in replacements as $1, $2, etc.
Non-capturing groups (?:...) — group without capturing. Use when you need grouping for quantifiers but do not need the match value.
Named groups (?<name>...) — capture with a name for cleaner replacement code:
const match = '2026-02-19'.match(/(?<year>d{4})-(?<month>d{2})-(?<day>d{2})/);
console.log(match.groups.year); // '2026'
Lookaheads and Lookbehinds
Zero-width assertions match positions without consuming characters:
(?=...)— positive lookahead: "followed by"(?!...)— negative lookahead: "not followed by"(?<=...)— positive lookbehind: "preceded by"(?<!...)— negative lookbehind: "not preceded by"
// Match numbers followed by 'px' but don't include 'px' in the match
/d+(?=px)/g
// Match prices not preceded by a currency symbol
/(?<!$)d+.d{2}/g
Flags
Flags modify how the pattern is applied:
g— global: find all matches, not just the firsti— case-insensitivem— multiline:^and$match line starts and ends, not just string start and ends— dotAll:.matches newlines toou— Unicode mode: enables full Unicode matching
Practical Examples
// Email validation (simplified — true email validation needs the full RFC)
/^[^s@]+@[^s@]+.[^s@]+$/
// Indonesian phone number (+62 format)
/^(+62|0)[0-9]{9,12}$/
// Strip HTML tags
/<[^>]*>/g
// Find URLs
/https?://[^s"'<>]+/g
// Extract hashtags
/#[w-]+/g
// Validate UUID v4
/^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i
// Match repeated words (the the)
/(w+)s+\1/gi
// Validate semantic version
/^d+.d+.d+(?:-[w.]+)?(?:+[w.]+)?$/
Performance Pitfalls: Catastrophic Backtracking
Some regex patterns can hang or crash with certain inputs — a vulnerability known as ReDoS (Regular Expression Denial of Service). The classic example: (a+)+ applied to the string "aaaaaaaab". The engine tries exponentially many combinations before failing.
Avoid nested quantifiers on overlapping patterns. Use possessive quantifiers (++) or atomic groups if your engine supports them. When accepting user-supplied regex patterns (never a good idea), use a library like safe-regex to validate the pattern first.
Testing and Iteration
Never write regex from memory for important patterns. Always test against a good set of inputs — including edge cases and adversarial inputs. Use PureFormatter's Find & Replace with regex mode enabled to test patterns interactively against real text before deploying them in code.