O'Reilly logo
live online training icon Live Online training

Learning Regular Expressions: Unlock the power of text processing using grep, Javascript, and Python

James Lee

Regular expressions have been around for decades, and nearly every programming language makes use of them, since they’re so handy for searching and manipulating text. Regular expressions are patterns that are used to find text, which can then be manipulated programmatically. For example, say you need to find and replace a specific text string across thousands of files, or search a single file for that one specific string — but you can't remember if it was written as uppercase or lowercase (or both!). “Regex” gives you the flexibility and power to do these kinds of complex, nuanced search-and-replaces. But regex has a reputation for being hard to learn.

This class walks students through their seemingly arcane syntax, with short applied examples, so that students can save themselves time and effort. This course will discuss the usage and features of regular expression by using grep, to keep things simple and hands-on, but also demonstrates applied examples in common programming languages like Python and JavaScript, so readers can take their knowledge and apply it in their daily programming and operations tasks.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • What regular expressions are and why they are used
  • Regular expression syntax and rules
  • How to read and write regular expressions

And you’ll be able to:

  • Use Unix tools (grep, egrep, sed, awk) that use regular expressions
  • Read, and more importantly write, regular expressions
  • Test and validate regular expressions
  • Use regular expressions in programming languages (Javascript, Python)

This training course is for you because...

  • You are a programmer, engineer, or other professional comfortable working in a command-line, shell environment. (You do not need to be a programmer to take this course.)
  • You need tools to locate, parse, and replace text.
  • You want to be able to search and replace text to solve day-to-day problems, from simple to complex.

Prerequisites

  • Basic Linux command-line experience (cd, ls, cat)
  • Use of a text editor, either terminal based or GUI based (vi, emacs, nano)

Recommended preparation:

  • Install Python or JavaScript/Node if you want to experiment with them.

Recommended follow-up:

About your instructor

  • In the early 1990s, James Lee installed Red Hat on an unused piece of hardware he found in the closet and hasn't looked back since. James uses Linux both personally and professionally and is particularly happy that, over his career in technology, he's never had to use Windows. He has worked with many Linux distributions, including Red Hat, CentOS, Scientific Linux, Debian, and Ubuntu, and recently booted Raspbian on a Raspberry Pi 3. Nowadays, he does most of his development work on a MacBook Pro but spends more time in Darwin than macOS—often with multiple active SSH sessions to various Linux servers. James is also an open source advocate and instructor; he has delivered countless training courses on open source products such as Linux, Perl, and Python. And after all these years, penguins are still one of his favorite animals.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to Regular Expressions and grep/egrep (20 minutes)

  • What are they?
  • Why are they important?
  • What will we use in class to demonstrate them?
    • grep – why grep?
    • egrep – difference between grep and egrep
    • sed / awk
  • Example using grep
    • Case insensitive
    • Use a file of regular expressions
    • Display what does not match
  • Online tools
  • Q&A

Basic Syntax (25 minutes)

  • Simple regular expressions – normal characters and .
  • Bounding at beginning and end of the line - ^ and $
  • Matching special characters with the backslash
  • Regular Expression Rule #1 – the match that begins earliest wins
  • Exercise: Basic Syntax
  • Exercise solution and Q&A

Character Classes (25 minutes)

  • Character class syntax – in a class
  • Character class syntax – not in a class
  • Predefined character classes - \d, \w, etc.
  • POSIX classes – [::alpha::], etc
  • Exercise: Character classes
  • Exercise solution and Q&A
  • Break (10 minutes)

Quantifiers, Boundaries and OR-ing (30 minutes)

  • Quantifier syntax – specifying how many of something
  • Regular Expression Rule #2 – quantifiers are greedy
  • Boundary syntax - \b, etc
  • OR syntax
  • Exercise: Quantifiers, boundaries and OR-ing
  • Exercise solution and Q&A

Capturing and Replacing (25 minutes)

  • Capturing syntax (text extraction) – () and \1, etc.
  • Replacing text
  • Turn off capture - (?:)
  • Exercise: Capturing and replacing
  • Exercise solution and Q&A
  • Break (10 min)

Lazy Quantifiers (20 minutes)

  • Lazy quantifier syntax
  • Exercise: Lazy quantifiers
  • Exercise solution and Q&A

Inline Modifiers, Lookarounds (25 minutes)

  • Inline modifier syntax – (?i), etc
  • Lookarounds – (?=), etc
  • Exercise: Inline modifers and lookarounds
  • Exercise solution and Q&A
  • Break (10 min)

Practical and Efficient Regular Expressions (30 minutes)

  • Practical regular expressions - writing regexes that solve real problems
  • Efficient regular expressions - writing regexes that use resources, such as memory, efficiently
  • Exercise: Practical and efficient regular expressions
  • Exercise solution and Q&A

Other Tools and Programming Languages (30 minutes)

  • sed and awk
  • Javascript
  • Python
  • Exercise: Other tools and programming languages
  • Exercise solution and Q&A