If you are a Systems Administrator, you know about the importance of regular expressions. They help you in your day-to-day tasks such as running a simple command to search for a particular file or text.

A regular expression also referred to as regex, is a way of matching strings of text. It is built-in into many programming languages such as Perl, Ruby, and Python. It is powerful but due to the compactness of how it is written, it can also be confusing.

Basic Concepts

There are three basic concepts in a regular expression that you should be familiar with. They provide the foundation for your pattern matching. As such, it is something you will be using often.

Boolean “or”

If you want to be able to search whether one of the following strings exists, you would use the vertical bar. It is often used to search for a word that can have a different spelling or a plural version of the word.

For example, when you specify color|colour, it will find either color or the British variation of the word colour. When you specify fish|fishes, it will find either the singular word fish or the plural word fishes in your search criteria.


*Match zero or more occurrences of the preceding character.ab*c matches ac, abc, abbc, and so on.
+Match one or more occurrences of the preceding character.ab+c matches abc, abbc, abbbc, and so on.
?Match either one or none of the preceding character.colou?r matches color and colour.
^Match the position at the start of the line.^Man matches only if the word Man appears at the beginning of the line.
$Match the position at the end of the line.cat$ matches only if the word cat appears at the end of the line.
{x}Match x occurrences of the preceding character.ab{2}c matches abbc.
{x,y}Match at least x occurrences and at most y occurrences of the preceding character.ab{1,3}c matches abc, abbc, and abbbc.


Matching Time of Day

Matching the time of day seems easy but it can be a bit tricky. Suppose you want to match 8:02 am, how would you do it?

You may use something like the following:

[0-9]?[0-9]:[0-9][0-9] (am|pm)

This expression does match 8:02 am but it also matches 99:99 pm which is not what we want. Instead of that expression, use this one instead:

[01]?[0-9]:[0-5][0-9] (am|pm)

This expression does not match 99:99 pm which is what we want but it will match 8:02 am, 10:17 am, and 12:59 pm.

What if we also want to handle 24-hour clock? If we have 22:05 pm, the above expression will not work. You will need to add an optional expression.

([01]?[0-9]|2[0-3]):[0-5][0-9] (am|pm)

The additional 2[0-3] allows for the hours 20 through 23.

Regular Expressions in Python (Extr...
Regular Expressions in Python (Extracting Useful Data) (Video 55)

Leave a Comment