Imagine you wanted to match only lines that had vowels. To test this out we'll use a contrived conversation about Cthuhlu:

Evil lord Cthuhlu said Xchjklz plktrdfg and I agree with him

There's two lines with vowels, and then what Cthuhlu says doesn't have vowels. We'd like to just ignore what he says so let's make a regex script to do that:

[aeiouy]

You can either type that into ex4.regex and run it, or just run regetron ex4.txt and then enter that into the shell.

What that one line does is create a "set" of characters you want to match. It's kind of like saying "I want to match 'a' or 'e' or 'i' or 'o' or 'u' or 'y'." Another way to say this is, "Match any line with any one of these chars: aeiouy".

There's a few other things you can do with sets, but run this and see what you get before I continue.

What You Should See

That file doesn't exist.
> [aeiouy]
Input file is empty. Use !load to load something.
>

See how it removed the line in the middle that didn't have vowels (the line Cthulhu said)? It did this because, after scanning each character, it didn't find 1 that matched the set you specified.

Ranges Of Characters

This is really handy, but it will be tedious if you had to enter in an entire alphabet or all the numbers when you wanted to match those. For this common case you can use a range of characters by putting a '-' (dash) between them. For example, [a-zA-Z] will match all characters "a through z" or "A through Z" thus matching all upper or lowercase characters.

Extra Credit

  • Write another regex that matches only lowercase characters. Use a range for it.
  • Add some numbers to the corpus text, and then write a regex with the numbers in a range (like 0-9).
  • Run regetron and use !data to set your phone number. Now write a regex that matches your phone number using the range sets.

Portability Notes

It's unclear how some regex engines will treat some human languages and alphabets when doing a range. Double check your documentation to make sure it's even possible.