You know most of the symbols and now you have a problem: How do you match the symbols rather than use them? For example, what if you wanted to match a regex with a regex? You'd need a way to "escape" the symbols that are in the regex you want to match, and you do that the same way you do in most programming languages with the \ (backslash) character. Let's give it a try, but I have to warn you this will probably warp your brain so pay attention:

.*[A-Z].* ^[0-9]+.*$ ^**+*[A-Z[]]$

Pay attention: These lines are your corpus text not your regex. Repeat after me, these are the lines of text you are looking for, not the regex. The next file is the regex.

I can already see the fear in your eyes, so I'm going to write the regex we'll use in verbose form so you can take them slow and see what I'm doing:

\.  # a dot
\*  # an asterisk
\[  # open bracket
.+  # one-or-more
\]  # close bracket
\.  # a dot
\*  # an asterisk


^   # from the start
\^  # a caret char
.+  # one-or-more chars
\.  # a dot char
\*  # an asterisk
\$  # a dollar sign
$   # to the end


\\  # a literal backslash
\+  # a plus
\*  # an asterisk
.*  # zero-or-more
\$  # a dollar sign
$   # at the end

Again, pay attention! That file is the ex13.regex and it emph{is the regex that does the matching}. The other one above is ex13.txt and it's is the corpus text. Get that straight in your mind before continuing.

Even in verbose form this is pretty heinous. In all honesty, if you're trying to do this you shouldn't use a regex but should use a real lexer. I'll be showing you how to write one of those but the exercise is good for being able to understand regex like this.

In the first regex I'm trying to match, "Any regex that starts with .*[, has something inside the character set, and then ends with ].*". To do this I have to escape each of the regex chars I want to match using the \ character.

The other regex are going to be part of an extra credit so I'm not going to explain them.

What You Should See

When you run this you should see each regex match only one line of the corpus text file:

That file doesn't exist.
>
\.  # a dot
\*  # an asterisk
\[  # open bracket
.+  # one-or-more
\]  # close bracket
\.  # a dot
\*  # an asterisk

Input file is empty. Use !load to load something.
>
^   # from the start
\^  # a caret char
.+  # one-or-more chars
\.  # a dot char
\*  # an asterisk
\$  # a dollar sign
$   # to the end

Input file is empty. Use !load to load something.
>
\\  # a literal backslash
\+  # a plus
\*  # an asterisk
.*  # zero-or-more
\$  # a dollar sign
$   # at the end

Input file is empty. Use !load to load something.
>

If it doesn't work make sure you're putting the proper number of newlines between regex. Remember, an empty line starts verbose mode, and another ends it, so you need two between each one.

Extra Credit

  • Take the other two regex and write a similar English sentence describing what they're mapping.
  • Convert the regex back to normal form from verbose.
  • Write lines of corpus text that match each of these regex in new ways.
  • Write lines of corpus text that do not match the regex, then modify the regex to make them match.

Portability Notes

Many regex engines also use the \ (backslash) character to add extra features, so be careful when you use it on letters and numbers. You don't need to escape letters and numbers, and they'll potentially clash with some feature of the regex engine.