Python Friday #211: First Steps With Regular Expressions

I waited a long time to explore regular expressions, then they tend to get complicated in no time. But for some problems there is simply no way around. Let us explore the basics and how we can use them in our Python code.

This post is part of my journey to learn Python. You find the code for this post in my PythonFriday repository on GitHub.

No installation required

Regular expressions are part of the standard library of Python. Therefore, we do not need to install a specific package and can import the re module with this line:

import re

import re

Pythex: A Python regular expression editor

While writing regular expressions (short regex), we need fast feedback then it will take a few rounds until they work the way we want it. Tools like Pythex, a regular expression editor in the web, can answer immediately if our regex works or not. We can enter our sample text and then try our regular expressions until they match what we want to find.

Pythex lets you test your regular expressions in the web browser

While this is optional, I suggest you use a tool like Pythex to help you visualise the matches of your regular expressions. It makes working with them so much less cumbersome.

Find text parts (basic)

To figure out if our regex matches something anywhere in a text, we can use the re.search() function:

res = re.search("123","123abc123")

1	res = re.search("123","123abc123")

If the regex matches a part of the text, we get back a Match object with the first position:

>>> res
<re.Match object; span=(0, 3), match='123'>

1 2	>>> res <re.Match object; span=(0, 3), match='123'>

To find all parts that match our regex, we can use the re.findall() function:

res = re.findall("123","123abc123")

1	res = re.findall("123","123abc123")

This gives us a list of all parts that match our regular expression:

>>> res
['123', '123']

1 2	>>> res ['123', '123']

If we want to get an iterator for all matches, we can use the function re.finditer():

res = re.finditer("123","123abc123")

1	res = re.finditer("123","123abc123")

To see the different matches, we can loop through our result:

>>> for match in res:
...     print(match)
...
<re.Match object; span=(0, 3), match='123'>
<re.Match object; span=(6, 9), match='123'>

>>> for match in res:

... print(match)

...

<re.Match object; span=(0, 3), match='123'>

<re.Match object; span=(6, 9), match='123'>

If we only care about a match at the beginning of a text, we can use the re.match() function:

res = re.match("123", "123abc123")

1	res = re.match("123", "123abc123")

If our regex should match the whole text, we can use the re.fullmatch() function:

res = re.fullmatch("abc", "abc")

1	res = re.fullmatch("abc", "abc")

While these examples show us how the main functions in the re module work, they only scratch at the surface and are not of much use. That will change when we combine those basic functions with metacharacters, which will be the topic of next week’s post.

Substitute text parts with other text parts

The re.sub() function searches for our regex, replaces all matches and returns the text with the replaced parts:

res = re.sub("123", "#", "123abc123")

1	res = re.sub("123", "#", "123abc123")

>>> res
'#abc#'

1 2	>>> res '#abc#'

If we want to know how many substitutions happened, we can use the re.subn() function instead:

res, no = re.subn("123", "#", "123abc123")

1	res, no = re.subn("123", "#", "123abc123")

This gives us a tuple with the result and the number of substitutions back.

>>> res
'#abc#'
>>> no
2

>>> res

'#abc#'

>>> no

Compile regular expressions

If we reuse the same regular expressions multiple times, we better compile it once and then work with that pre-compiled object to speed up our application:

p = re.compile('123')

1	p = re.compile('123')

>>> p.search("abc123cde")
<re.Match object; span=(3, 6), match='123'>

1 2	>>> p.search("abc123cde") <re.Match object; span=(3, 6), match='123'>

With these basic parts of the re module we can already do a bit of useful work. However, the main benefit from regular expressions comes from metacharacters, that we will explore next week.

3 thoughts on “Python Friday #211: First Steps With Regular Expressions”

Pingback: Python Friday #212: Regular Expressions With Metacharacters - Improve & Repeat
Pingback: Die KW 05/2024 im Link-Rückblick | artodeto's blog about coding, politics and the world
cyril

2024-02-20 at 12:47

Pythex seems interesting.
I use an alternative: https://pythonium.net/regex

Python Friday #211: First Steps With Regular Expressions

No installation required

Pythex: A Python regular expression editor

Find text parts (basic)

Substitute text parts with other text parts

Compile regular expressions

Next

Like this:

Related

3 thoughts on “Python Friday #211: First Steps With Regular Expressions”

Leave a Comment Cancel reply

No installation required

Pythex: A Python regular expression editor

Find text parts (basic)

Substitute text parts with other text parts

Compile regular expressions

Next

Share this:

Like this:

Related

3 thoughts on “Python Friday #211: First Steps With Regular Expressions”

Leave a Comment Cancel reply