Python Regular Expressions

Python Regular Expression is a special sequence of characters that helps us to match or search other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.
The module re provides full support for Perl-like regular expressions in Python Regular Expressions. The re module raises the exception re.error if an error occurs while compiling or using a Python Regular Expressions.

Python Regular Expressions

Let’s start with Match() function,The Match() function attempts to match RE pattern to string with optional flags.
Syntax for Match() function: re.match(pattern, string, flags=0)

  • Pattern:The Pattern is the regular expression to be matched.
  • String: This is the string, which would be searched to match the pattern at the beginning of string.
  • Flags:You can specify different flags using bitwise OR (|) operator.

On successfull exicution of code re.match function returns a match object OR None on failure.Here we use group(num) or groups() function of match object in order to get matched expression.

  • group(num=0): This function returns entire match or you can specifi the number of match you want to return(i.e num=1 OR 2 OR n)
  • groups():This function returns all the matched string in the form of tuple,if there is no match it return an empty tuple

Let’s look at the example given below

import re
line = "Hi How Are You Hi I Am Fine Hi Python Hi Java. Hi PHP "
matchObj = re.match( r'Hi', line)
if matchObj:
    print "Match Found :", matchObj.group()
else:
    print "No match found"

Output

Match Found : Hi

The Next Function we used to search Pattern or string is search() function.This function searches for first occurrence of RE pattern within string
Syntax for search() function: re.search(pattern, string, flags=0)
All the parameter are same as describe in match() function
Let’s look at the example

import re
line = "Hi How Are You Hi I Am Fine Hi Python Hi Java. Hi PHP "
matchObj = re.search( r'Hi', line)
if matchObj:
    print "Match Found :", matchObj.group()
else:
    print "No match found"

Output

Match Found : Hi

So now the Question raise in our mind that both function have same output with same functionality and same syntax parameter,so which one to use.So the answer is Python offers two different primitive operations based on regular expressions: match checks for a match() only at the beginning of the string, while search() checks for a match anywhere in the string.
Look at the example below

line = "Hi How Are You Hi I Am Fine Hi Python Hi Java. Hi PHP "
matchObj = re.match( r'python', line, re.M|re.I)
if matchObj:
    print " Match Found Using match():", matchObj.group()
else:
    print "No Match Found using match()"
matchObj = re.search( r'python', line, re.M|re.I)
if matchObj:
    print " Match Found using search():", matchObj.group()
else:
    print "No match Found using search()"

Output

No Match Found using match()
Match Found using search(): Python

List of Option Flags
Regular expression provides optional modifier to control various aspects of matching. These modifiers are represented as flags. You can provide multiple flags or modifiers using exclusive OR (|), as shown in above example

  • re.M: Makes $ match at the end of a line (not just the end of the string) and makes ^ match at the start of any line (not just the start of the string).
  • re.I: use case-insensitive matching
  • re.L: Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B).
  • re.S: Makes a period (dot) match any character, including a newline.
  • re.U: Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.

Now in our example text string contains multiple ” Hi “, but the above 2 function return only 1 instances of literal text strings.So to find multiple matches we use findall() Function
The findall() function returns all substrings of the input that match the pattern without overlapping.

line = "Hi How Are You Hi I Am Fine Hi Python Hi Java. Hi PHP "
pattern="Hi"
for match in re.findall(pattern,line,re.M|re.I):
    print 'Found "%s"' % match

Output

Found "Hi"
Found "Hi"
Found "Hi"
Found "Hi"
Found "Hi"

The Function finditer() returns an iterator that produces Match instances instead of the strings returned by findall().

line = "Hi How Are You Hi I Am Fine Hi Python Hi Java. Hi PHP "
pattern="Hi"
for match in re.finditer(pattern,line,re.M|re.I):
    s = match.start()
    e = match.end()
    print 'Found "%s" at %d:%d' % (line[s:e], s, e)

Output

Found "Hi" at 0:2
Found "Hi" at 15:17
Found "Hi" at 28:30
Found "Hi" at 38:40
Found "Hi" at 47:49

Regular expressions support more powerful patterns than simple literal text strings.Patterns can repeat, can be anchored to different logical locations within the input, and can be expressed in compact forms that do not require every literal character to be present in the pattern. All of these features are used by combining literal text values with metacharacters that are part of the regular expression pattern syntax implemented by re .
Regular expression patterns:
Except for control characters, (+ ? . * ^ $ ( ) [ ] { } | \), all characters match themselves. You can escape a control
character by preceding it with a backslash.
Following table lists the regular expression syntax that is available in Python:

Python Regular Expressions

Python Exception Handling

Python Regular Expressions

Let’s look at the example by using one of the above mentions patterns

import re
phone = "Hello Pythonlovers.This is my Phone Number 0123-456-789 call me "

num = re.search(r'[\D]+',phone)
print "Text Found : ", num.group()

num = re.search(r'[\d-]+', phone)
print "Phone Num Found : ", num.group()

Output

Text Found:  Hello Pythonlovers.This is my Phone Number 
hone Num Found:  0123-456-789

Now the Problem is as you see in the output we got the text upto “Phone Number” and search() function stop searching for the non-digit i.e string or words or letter .But after digit there is some text also,So how we can extract this text ? Simple there is a function called sub() which replaces all occurrences of the RE pattern in string with repl, substituting all occurrencesunless max provided. This method would return modified string.
Syntax: re.sub(pattern, repl, string, max=0)

import re
phone = "Hello Pythonlovers.This is my Phone Number 0123-456-789 call me"
num = re.sub(r'\D',"",phone)
print "Number Found : ", num

num = re.sub(r'[\d-]',"",phone)
print "Text found : ", num

Output

Number Found:  0123-456-789
Text Found:  Hello Pythonlovers.This is my Phone Number call me

compile() function
re includes module-level functions for working with Python Regular Expressions as text strings,but it is more efficient to compile the expressions a program uses frequently. The compile() function converts an expression string into a RegexObject .
Example:

import re
address = re.compile(’[\w\d.+-]+@([\w\d.]+\.)+(com|org|edu)’,re.UNICODE)
candidates = [
u'first.last@example.com',
u'first.last+category@gmail.com',
u'valid-address@mail.example.com',
u'not-valid@example.foo',
]
for candidate in candidates:
    match = address.search(candidate)
    print '%-30s %s' % (candidate, 'Matches' if match else 'No match')

Output

first.last@example.com           Matches
first.last+category@gmail.com    Matches
valid-address@mail.example.com   Matches
not-valid@example.foo            No match

The module-level functions maintain a cache of compiled expressions. However,the size of the cache is limited, and using compiled expressions directly avoids the cache lookup overhead. Another advantage of using compiled expressions is that by precompiling all expressions when the module is loaded, the compilation work is shifted to application start time, instead of to a point when the program may be responding to a user action.

Hope guys you will understand the concept of Python Regular Expressions clearly.Try to implement it with more Pattern in code, and if you find any difficulty, feel free to drop a comment on this post.
For more detail on Python Regular Expressions please visit https://docs.python.org/2/library/re.html