Theory of Computation: Unit II: Regular Expressions and Languages

Regular Expression and Regular Languages

Regular Expressions and Languages - Theory of Computation

A language is regular if it can be expressed in terms of regular expression.

UNIT II: Regular Expressions and Languages

Syllabus

Regular expression - Regular Languages- Equivalence of Finite Automata and regular expressions Proving languages to be not regular (Pumping Lemma) - Closure properties of regular languages.


Regular Expression and Regular Languages

AU May-13, 16, Marks 8

Let Σ be an alphabet which is used to denote the input set. The regular expression over Σ can be defined as follows.

1. ϕ is a regular expression which denotes the empty set.

2. ε is a regular expression and denotes the set {ε} and it is a null string.

3. For each 'a' in Σ 'a' is a regular expression and denotes the set {a}.

4. If r and s are regular expressions denoting the languages L1 and L2 respectively, then

r+s is equivalent to L1 U L2 i.e. union.

rs is equivalent to L1 L2 i.e. concatenation

r* is equivalent to L1* i.e. closure.

The r* is known as kleen closure or closure which indicates occurrence of r for ∞ number of times.

For example if Σ = {a} and we have regular expression R = a*, then R is a set denoted by R= {ε, a, aa, aaa, aaaa, ...}

That is R includes any number of a's as well as empty string which indicates zero number of a's appearing, denoted by & character.

Similarly there is a positive closure of L which can be shown as L+. The L+ denotes set of all the strings except the ε or null string. The null string can be denoted by ε or ^ If Σ = {a} and if we have regular expression R = a+ then R is a set denoted by

R = {a, aa, aaa, aaaa, ....)

We can construct L* as

L* = ε L+

Let us try to use regular expressions with the help of some examples.

Definition of Regular Language: A language is regular if it can be expressed in terms of regular expression.

Example 2.1.1 Design regular expression for the language containing all the strings containing any number of a's and b's.

Solution: The regular expression will be

r.e. = (a + b)*

This will give the set as L = {ε, a, aa, ab, b, ba, bab, abab, ..... any combination of a and b}.

The (a + b)* means any combination with a and b even a null string.

Example 2.1.2 Write r.e. to denote a language L which accepts all the strings which begin or end with either 00 or 11.

Solution: The r.e. can be categorized into two subparts.

R = L1 + L2

L1 = The strings which begin with 00 or 11.

L2 = The strings which end with 00 or 11.

Let us find out L1 and L2.

L1 = (0011) (any number of O's and 1's)

L1 = (0011) (0+1)*

Similarly,

L2 = (any number of 0's and 1's) (00 + 11)

= (0+1)* (00 + 11)

Hence

R = [(00+11) (0+1)*] + [(0+1)* (00+11)]

Example 2.1.3 Write the r.e. to denote the language L over Σ = {a, b} such that all the strings do not contain the substring "ab".

Solution: The L = {ε, a, b, bb, aa, ba, baa ...)

The   r.e. = (b* a*)

In this regular expression if we substitute a* = ε we get all combination of b's and similarly if we substitute b* = ε then we will get all combinations of a's. We have strictly maintained a condition for not to have ab as substring by giving the regular expression as b* a*.

Example 2.1.4 What is regular expression? Write a regular expression for set of strings that consists of alternating 0's and 1's. AU May-16, Marks 8

Solution: Regular expression - Refer section 2.1.

Regular expression

Let Σ be an alphabet which is used to denote the input set. The regular expression over Σ can be defined as follows.

1. ϕ is a regular expression which denotes the empty set.

2. ε is a regular expression and denotes the set {ε} and it is a null string.

3. For each 'a' in Σ 'a' is a regular expression and denotes the set {a}.

4. If r and s are regular expressions denoting the languages L1 and L2 respectively, then

r+s is equivalent to L1 U L2 i.e. union.

rs is equivalent to L1 L2 i.e. concatenation

r* is equivalent to L1* i.e. closure.

The r* is known as kleen closure or closure which indicates occurrence of r for ∞ number of times.

For example if Σ = {a} and we have regular expression R = a*, then R is a set denoted by R= {ε, a, aa, aaa, aaaa, ...}

r.e. = (0+ 1+ 0+ 1+)*

Example 2.1.5 Write regular expression to describe following languages:

i) {w = {0, 1}*}: w corresponds to binary encoding, without leading 0's, of natural numbers that are evenly divisible by 4}

ii) {w = {0, 1}}*: w corresponds to binary encoding, without leading 0's of natural numbers that are powers of 4}

iii) {w = {0-9}*: w corresponds to the decimal encoding, without leading 0's of an odd natural number}.

Solution:

i) The valid strings for given language are (100, 1000, 1100, 11100,...)

For ex: 100 = 4

Hence regular expression will be

r.e. = (1(0+1)* 00) + 0

ii) The vaild strings for given language are (100, 10000, 1000000,...}

Since (i) 100 in binary = 1×22 + 0×21 + 0×20 = 4 + 0 + 0 = 4 in decimal

(ii) 10000 in binary = 1×24 + 0×23 + 0×22 + 0×21 + 0×20 = 16 in decimal

Thus we want 4, 16, 64... in binary representation as valid string.

Hence the regular expression is

r.e. = 1(00)*

iii) The odd natural numbers are those numbers that end with either 1, 3, 5 or 9. Hence regular expression using the set (0-9)* will be as follows

r.e. = (ε +((1-9)(0-9)* ))(1+3+5+7+9)

Example 2.1.6 Write regular expressions for the following languages over the alphabet Σ = (a, b)

i) All strings that do not end with 'aa'.

ii) All strings that contain an even number of 'b's.

iii) All strings which do not contain the substring 'ba'.

Solution:

i) The language contains all the strings that do not end with aa means it may end with bb, ba or ab. Hence r. e. will be

(a + b) * ( (ab)* + (ba)* + b*)

ii) r. e. = (a* b a* b)*

iii) The L = {ε, a, b, bb, aa, ab, abb, ... }

r. e. = (a*b*)

Example 2.1.7 Give the Regular Expression for the following languages:

i) L = { x|x ϵ {a,b}* and x contains exactly two a's}

ii) L = (a, c, ab, cb, abb, cbb, abbb, cbbb, ...}

iii) L = { x|x ϵ {a, b}* and x is any string that begins in "abb" or a}

Solution:

i) b* ab* ab*

ii) The set contains all the string that either begin with a or c but always end with any number of b's.

r.e. = (a+c) b*

iii) r.e. = (abb + a) (a + b)*

Review Question

1. Discuss on regular expressions. AU: May-13, Marks 8

Theory of Computation: Unit II: Regular Expressions and Languages : Tag: : Regular Expressions and Languages - Theory of Computation - Regular Expression and Regular Languages