From Wikipedia, the pumping language for regular languages is the following:

Let $L$ be a regular language. Then there exists an integer $p\ge 1$ (depending only on $L$) such that every string $w$ in $L$ of length at least $p$ ($p$ is called the "pumping length") can be written as $w = xyz$ (i.e., $w$ can be divided into three substrings), satisfying the following conditions:

- $|y| \ge 1$
- $|xy| \le p$ and
- for all $i \ge 0$, $xy^iz \in L$.

$y$ is the substring that can be pumped (removed or repeated any number of times, and the resulting string is always in $L$).

(1) means the loop y to be pumped must be of length at least one; (2) means the loop must occur within the first p characters. There is no restriction on x and z.

In simple words, For any regular language L, any sufficiently long word $w\in L$ can be split into 3 parts. i.e $w = xyz$, such that all the strings $xy^kz$ for $k\ge 0$ are also in $L$.

Now let's consider an example. Let $L=\{(01)^n2^n\mid n\ge0\}$.

To show that this is not regular,
you need to consider what all the decompositions $w=xyz$ look like, so what are all the possible things x, y and z can be given that $xyz=(01)^p2^p$ (we choose to look at this particular word, of length $3p$, where $p$ is the pumping length). We need to consider where the $y$ part of the string occurs. It could overlap with the first part, and will thus equal either $(01)^{k+1}$, $(10)^{k+1}$, $1(01)^k$ or $0(10)^k$, for some $k\ge 0$ (don't forget that $|y|\ge 1$). It could overlap with the second part, meaning that $y=2^k$, for some $k>0$. Or it could overlap across the two parts of the word, and will have the form $(01)^{k+1} 2^l$, $(10)^{k+1} 2^l$, $1(01)^k 2^l$ or $0(10)^k 2^l$, for $k\ge0$ and $l\ge1$.

Now pump each one to obtain a contradiction, which will be a word not in your language. For example, if we take $y=0(10)^k2^l$, the pumping lemma says, for instance, that $xy^2z=x0(10)^k2^l0(10)^k2^lz$ must be in the language, for an appropriate choice of $x$ and $z$. But this word cannot be in the language as a $2$ appears before a $1$.

Other cases will result in the number of $(01)$'s being more than the number of $2$'s or vice versa, or will result in words that won't have the structure $(01)^n2^n$ by, for example, having two $0$'s in a row.

Don't forget that $|xy| \le p$. Here, it's useful to shorten the proof: many of the decompositions above are impossible because they would make the $z$ part too long.

Each of the cases above needs to lead to such a contradiction, which would then be a contradiction of the pumping lemma. Voila! The language would not be regular.

1Didn't know about Myhill-Nerode theorem, cool! – Daniil – 2012-04-04T13:20:46.247

Wikipedia also has a section about the number of words in a regular language: if you can prove your language doesn't match the characterization, then your language is not regular: http://en.wikipedia.org/wiki/Regular_language#The_number_of_words_in_a_regular_language

– Alex ten Brink – 2012-04-04T14:39:33.380@Daniil,

regular expressions can't countseems to me a popular informal formulation of Myhill-Nerode theorem. – AProgrammer – 2012-04-04T15:28:07.590@AlextenBrink: That is neat. I guess the constants in the statement are the eigenvalues of the automaton's Laplacian? This would make a nice addition to the answers here. – Louis – 2012-04-04T20:27:18.130

@Louis: actually, we've found no reference for that theorem at all, so if you know more about it... Also see: http://cs.stackexchange.com/questions/1045/number-of-words-of-a-given-length-in-a-regular-language

– Alex ten Brink – 2012-04-04T20:31:04.090@AlextenBrink: I had never seen the statement before you pointed to it, but here is how I'd try to prove it. Suppose the start state is $1$ and $2,3,\ldots, k$ are accepting. Then the number of words of length $n$ is given by $\sum

{j=2}^k a{1j}$ where $A^n = (a_{ij})$ where $A$ is the adjacency matrix of the automaton's graph. Maybe you can work out a recursion and solve it in that form. (So my other comment is likely not the right thing.) – Louis – 2012-04-04T21:17:56.363See here for two examples of using closure properties.

– Raphael – 2012-05-28T22:56:03.037It's worth noting that Myhill-Nerode is a

characterisationof REG, that is it always works (in principle). That's different from the Pumping lemma, which only yields a necessary criterion. – Raphael – 2015-10-13T10:13:06.267