You will need the latest Regetron for this exercise to work. Make sure you have at least version 1.4. Do pip install --upgrade regetron to get the new one.
You can match lines, but what if you just want the parts of a line that match? In this exercise you'll learn how to create a regex that has a "grouping" in it, which tells the regex engine to return only that part, and not the whole line. You create a group by surrounding the part of the regex you want to extract with parenthesis () characters.
Here's a corpus text with two lines I'll play with:
AA BB 10 CC 12 DD 30 My email is help@learncodethehardway.org buddy.
The first line is just some pairs of numbers and letters, and I want to get only the numbers. The second line has an email address in it and I want to extract the email address. To do that here's 4 regex:
Each of these lines first shows the regex without parenthesis, then with parenthesis. The version without will just print the line like you have been experiencing. The version with parenthesis will print only what's been matched as a list of items.
When you run this you'll see first the line get matched, then the groups that match with the grouped (parenthesis) version after that.
That file doesn't exist.
> [0-9]+
Input file is empty. Use !load to load something.
> ([0-9]+)
Input file is empty. Use !load to load something.
> [a-z]+@[a-z.]+
Input file is empty. Use !load to load something.
> ([a-z]+@[a-z.]+)
Input file is empty. Use !load to load something.
>
You should first notice that the regex ([0-9]+) returned a list with all of the numbers match that looks like ['10', '12', '30'] which is simply a Python formatted list. Next you'll see the email regex returns just ['help@learncodethehardway.org'] which is just the email in a list by itself. If you put more email addresses on that line it would return all of them.
Some regex engines are really bad at efficiently gathering captures, or do it in weird ways. Consult your API to see what's possible.