Developer9 min readMay 2026

Regular Expressions for Beginners

Learn the fundamentals of regex patterns, from simple character matching to advanced lookaheads and capturing groups.

What Are Regular Expressions?

Regular expressions (often shortened to "regex" or "regexp") are sequences of characters that define search patterns. Think of them as a powerful find-and-replace tool that understands rules rather than just literal text. Instead of searching for the exact word "cat," you can search for "any three-letter word starting with c" — and that's just the beginning.

Regex is supported in virtually every programming language, text editor, and command-line tool. Whether you're validating form inputs, parsing log files, or cleaning up data, regular expressions give you surgical precision over text manipulation.

Basic Syntax: Literals and Metacharacters

The simplest regex is just a literal string. The pattern hello matches the exact text "hello." But regex becomes powerful when you introduce metacharacters — characters with special meaning:

. — Matches any single character except a newline
^ — Matches the start of a string (or line with the m flag)
$ — Matches the end of a string (or line)
\ — Escapes a metacharacter so it's treated literally
| — Acts as an OR operator between alternatives

For example, c.t matches "cat," "cot," "cut," and even "c9t" because the dot accepts any character. If you literally want a period, escape it: c\.t only matches "c.t".

Character Classes

Character classes let you define a set of characters to match at a single position. You write them inside square brackets:

[aeiou] — Matches any single vowel
[0-9] — Matches any digit (ranges work with letters too: [a-z])
[^0-9] — The caret inside brackets means "NOT" — matches anything except a digit

Shorthand character classes save keystrokes: \d is equivalent to [0-9], \w matches word characters (letters, digits, underscore), and \s matches whitespace. Their uppercase versions (\D, \W, \S) match the opposite.

Quantifiers: How Many?

Quantifiers specify how many times a character or group should repeat:

* — Zero or more times
+ — One or more times
? — Zero or one time (makes something optional)
{3} — Exactly 3 times
{2,5} — Between 2 and 5 times
{3,} — 3 or more times

So \d{3}-\d{4} matches patterns like "555-1234" — exactly three digits, a hyphen, then four digits. Quantifiers are greedy by default (they match as much as possible). Add ? after a quantifier to make it lazy: .+? matches as few characters as possible.

Anchors and Boundaries

Anchors don't match characters — they match positions. The ^ anchor asserts "start of string" and $ asserts "end of string." This is crucial for validation. The pattern \d+ matches digits anywhere inside text, but ^\d+$ ensures the entire string consists only of digits.

The word boundary \b matches the position between a word character and a non-word character. The pattern \bcat\b matches "cat" as a standalone word but not inside "category" or "concatenate."

Common Patterns

Here are patterns you'll encounter frequently. Note that production-grade validation often needs more nuance, but these are solid starting points:

Email (simplified): [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
US Phone: $?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}
URL: https?://[^\s/$.?#].[^\s]*
Hex color: #([0-9a-fA-F]{3}|[0-9a-fA-F]{6})

Flags

Flags modify how the regex engine processes your pattern. In JavaScript you add them after the closing slash: /pattern/flags.

g (global) — Find all matches, not just the first
i (case-insensitive) — Treat uppercase and lowercase as equivalent
m (multiline) — Makes ^ and $ match line starts/ends, not just string starts/ends
s (dotAll) — Makes the dot match newline characters too

Lookaheads and Lookbehinds

Lookaheads let you assert what follows a position without including it in the match. A positive lookahead (?=...) asserts that what follows matches the pattern. A negative lookahead (?!...) asserts that what follows does not match.

For example, \d+(?= dollars) matches "100" in "100 dollars" but not in "100 euros." Lookbehinds work similarly but check what precedes: (?<=\$)\d+ matches "50" in "$50" but not in "€50." These are called "zero-width assertions" because they don't consume characters.

Common Mistakes

Forgetting to escape metacharacters: Searching for "file.txt" with file.txt also matches "filextxt" because dot is a metacharacter. Use file\.txt.
Greedy matching biting you: The pattern <.+> on <b>bold</b> matches the entire string, not just <b>. Use <.+?> or <[^>]+>.
Not anchoring validation patterns: Without ^ and $, your "email validation" regex will match valid-looking substrings inside invalid strings.
Overcomplicating patterns: Sometimes a simple string method (startsWith, includes) is clearer and faster than regex. Use regex when you need pattern matching, not simple string comparison.

Tips for Learning

Start small — match literal strings, then add one metacharacter at a time. Use a visual regex tester where you can see matches highlighted in real time. Read regex aloud: ^\d{3}-\d{4}$ becomes "start, three digits, hyphen, four digits, end." Build patterns incrementally, testing at each step. And remember: readability matters. A regex no one can maintain is worse than a slightly longer but understandable solution.