You should have been drilling your vocabulary from the last exercise every night and day until you know it cold. I know you probably absolutely hate drilling and memorizing things, but it's an important skill to pick up if you want to accelerate your learning on almost any subject. Memorization isn't useful for learning how to creatively apply skills, but it is the fastest way to acquire the foundational basics of a new skill.

Now that you've learned the core of regex solidly I'll show you more advanced things that work with those building blocks. First up is the concept of "alternating". Here's a corpus text we'll work with:

Numbers 123 Letters ABC

What we want to do is match different lines with one regex in 4 different ways:


You see the '|' character between the [0-9]+ and [A-Z]+ expressions? The reason I wanted you to know the basic symbols of regex solidly is that you will encounter a regex like this and not know what the | does. However, you do know what all the other symbols do and can read them. That means you can break it down and then go find out what the missing pieces mean. I'll do it for the first one:

You can now make a guess what the | does, probably some kind of "OR" as in a programming language. That's actually correct, it says "this OR that expression", and it's called "alternating". The reason you call this "alternating" is it causes the regex engine to try both expressions on either side of the regex until one fails, then it continues until that one matches or they both fail.

What You Should See

When you run this you'll see that each regex matches a different line or set of lines:

That file doesn't exist.
> ^[0-9]+|[A-Z]+$
Input file is empty. Use !load to load something.
> ^[0-9]+|[a-z]+$
Input file is empty. Use !load to load something.
> ^[4-6]+|[A-Z]+$
Input file is empty. Use !load to load something.
> ^[^0-9]+$|^[^A-Z]+$
Input file is empty. Use !load to load something.

Using the idea of "alternating" try to explain how the line matches and why.

Extra Credit

  • Add the | "alternator" symbol to your deck of cards and start drilling it too. You can also just call it "OR".
  • Rewrite these regex in verbose mode similar to how I did the breakdown above. Just write the English name for the symbol.
  • KEEP DRILLING THOSE SYMBOLS. It's hard work but you gotta do it, and it pays off big time in making your brain strong.

Portability Notes

Some regex engines don't implement alternating efficiently and instead add all sorts of backtracking and other problems. Be careful when using it and read the docs to make sure.