Regex that only matches itself

308

117

There are some pretty cool challenges out there involving regex (Self-matching regex, Regex validating regex)

This may well be impossible, but is there a regex that will ONLY match itself?

NOTE, delimiters must be included:

for example /thing/ must match /thing/ and not thing. The only match possible for your expression must be the expression itself. Many languages allow the implementation of a string in the place of a regular expression. For instance in Go

package main

import "fmt"
import "regexp"

func main() {

    var foo = regexp.MustCompile("bar")
    fmt.Println(foo.MatchString("foobar"))
}

but for the sake of the challenge, let the expression be delimited (starting symbol, expression, ending symbol ex: /fancypantpattern/ or @[^2048]@), if you want to argue quotes as your delimiter, so be it. I think given the apparent difficulty of this problem it won't make much of a difference.

To help you along:

Quick hack I put together for rubular.com (a webpage for ruby regex editing):

var test = document.getElementById("test")
,regex = document.getElementById("regex")
,delimiter="/"
,options = document.getElementById("options")
,delay = function(){test.value = delimiter + regex.value + delimiter + options.value}
,update = function(e){
    // without delay value = not updated value
    window.setTimeout(delay,0);
}
regex.onkeydown = update;
options.onkeydown = update;

Even though this is technically 'code golf' I will be very impressed if anyone can find an answer/ prove it is impossible.

Link is now fixed. Sorry to all

Winning answer thus far: jimmy23013 with 40 characters

Dylan Madisetti

Posted 2014-05-30T16:20:31.400

Reputation: 1 851

1Is it acceptable if the actual match doesn't cover the entire input? E.g. the regex (?=^abc$)a will match only the string abc, but the actual match will only be a. Would that be fine? – Martin Ender – 2016-01-12T13:24:51.840

@MartinBüttner it should be a complete match since an incomplete match leaves room for various matches. In your example ["(?=abc)a","(?=abc)ab"]. However, if there was no other possible string than the regex used, I would consider it valid. – Dylan Madisetti – 2016-01-12T13:43:57.760

1I'm not sure I understand. The only possible string matched by the regex (?=^abc$)a is abc and only a single match will be generated. Basically, if you passed the regex to something like JavaScript's test function (which only returns a boolean whether it matched or not), you couldn't distinguish (?=^abc$)a and ^abc$. They both accept exactly one string. – Martin Ender – 2016-01-12T13:47:23.637

2Obviously any regular expression that only includes literals will work: //, /a/, /xyz/, etc. It might be good to require that the regex has to include a non-literal operation. – breadbox – 2014-05-30T16:22:57.777

7literals won't work because you're required to match the backslashes for example /aaa/ will match aaa but not /aaa/ – Dylan Madisetti – 2014-05-30T16:28:05.303

1@DylanMadisetti Do we have to use // delimiters, or can we choose other delimiters (PCRE supports pretty much any character, and in particular you can use matched parentheses/braces/brackets as delimiters). – Martin Ender – 2014-05-30T19:18:13.487

@m.buettner I believe it's very hard (or impossible) even with arbitrary delimiters. – John Dvorak – 2014-05-30T19:28:15.233

@JanDvorak so do I, but it may open some possibilities. – Martin Ender – 2014-05-30T19:29:23.420

@m.buettner I'm open to said possibilities – John Dvorak – 2014-05-30T19:31:19.190

Use whatever delimiter you want as long as it is valid regex – Dylan Madisetti – 2014-05-30T19:31:41.453

3I think that this is quite nice mathematical/computational problem and the proof might be not easy... Many important theorems started as just a simple question to play with, so maybe in 5 years there will be wikipedia article "Madisetti problem" ;) – Paweł Tokarz – 2014-05-30T22:49:19.863

Hmm I'm afraid this might be trivial. There's nothing that says a language couldn't implement a re.match(pattern, str) which returns true iff pattern matches the entire string, with no leftovers. Then the regex "a" would match only the string "a". Could you be more specific? (I figured this out by going back to the basics, realizing a representation of a regex is different from the regular language it defines, then wondering how I'd go about representing a regular language (which is that which can be recognized by a DFA), then coming to this conclusion). – Claudiu – 2014-05-31T03:12:55.447

added some comments, I guess the problem does fall apart from a programmatic standpoint given that some reg helpers will accept strings – Dylan Madisetti – 2014-05-31T03:39:35.097

1It's impossible. You have to write an expression that matches a string of more characters then it contains because of the delimiters. Therefor, you have to use wildcards, which will always mach more than one string. So, an regex matching is possible, but will ever match other character strings too but not ONLY itself. – rednaZ – 2014-05-30T22:04:43.337

Can I use Tcl? In tcl strings need not be quoted. Just like bash. So if I use tcl or grep I don't need to use any delimiters right? – slebetman – 2014-06-17T07:58:18.170

@slebetman but then literals will work. Part of the challenge is forcing the delimeter s to match – Dylan Madisetti – 2014-06-17T16:05:14.733

2Yes, exactly. In some languages (think grep in bash) the delimiter is essentially an empty string. So assuming that regexp requires delimiters is already wrong in the first place. Indeed, since grep is one of the earliest implementation of regexp the canonical definition of regexp don't have delimiters. The wrongest manifestation of this assumption is PHP which requires two delimiters: "/ and /" – slebetman – 2014-06-18T02:44:26.437

Answers

542

PCRE flavor, 261 289 210 184 127 109 71 53 51 44 40 bytes

Yes, it is possible!

<^<()(?R){2}>\z|\1\Q^<()(?R){2}>\z|\1\Q>

Try it here. (But / is shown to be the delimiter on Regex101.)

Please refrain from making unnecessary edits (updates) on the Regex101 page. If your edit doesn't actually involve improving, trying or testing this regex, you could fork it or create new ones from their homepage.

The version works more correctly on Regex101 (44 bytes):

/^\/()(?R){2}\/\z|\1\Q^\/()(?R){2}\/\z|\1\Q/

Try it here.

This is much simpler than the original version and works more like a traditional quine. It tries to define a string without using it, and use it in a different place. So it can be placed very close to one end of the regex, to reduce the number of characters needing more characters to define the matching pattern and repeated more times.

Explanations:

  • \Q^\/()(?R){2}\/\z|\1\Q matches the string ^\/()(?R){2}\/\z|\1\Q. This uses a quirk that \Q...\E doesn't have to be closed, and unescaped delimiters work in \Q. This made some previous versions work only on Regex101 and not locally. But fortunately the latest version worked, and I golfed off some more bytes using this.
  • \1 before the \Q matches the captured group 1. Because group 1 doesn't exist in this option, it can only match in recursive calls. In recursive calls it matches empty strings.
  • (?R){2} calls the whole regex recursively twice, which matches ^\/()(?R){2}\/\z|\1\Q for each time.
  • () does nothing but capture an empty string into group 1, which enables the other option in recursive calls.
  • ^\/()(?R){2}\/\z matches (?R){2} with delimiters added, from the beginning to the end. The \/ before the recursive calls also made sure this option itself doesn't match in recursive calls, because it won't be at the beginning of the string.

51 bytes with closed \Q...\E:

/\QE\1|^\/(\\)Q(?R){2}z\/\E\1|^\/(\\)Q(?R){2}z\/\z/

Try it here.

Original version, 188 bytes

Thanks to Martin Büttner for golfing off about 100 bytes!

/^(?=.{173}\Q\2\)){2}.{11}$\E\/\z)((?=(.2.|))\2\/\2\^\2\(\2\?=\2\.\2\{173}\2\\Q\2\\2\2\\\2\)\2\)\2\{2}\2\.\2\{11}\2\$\2\\E\2\\\2\/\2\\z\2\)\2\(\2\(\2\?=\2\(\2\.2\2\.\2\|\2\)\2\)){2}.{11}$/

Try it here.

Or 210 bytes without \Q...\E:

/^(?=.{194}\\2\\.\)\{2}\.\{12}\$\/D$)((?=(.2.|))\2\/\2\^\2\(\2\?=\2\.\2\{194}\2\\\2\\2\2\\\2\\\2\.\2\\\2\)\2\\\2\{2}\2\\\2\.\2\\\2\{12}\2\\\2\$\2\\\2\/D\2\$\2\)\2\(\2\(\2\?=\2\(\2\.2\2\.\2\|\2\)\2\)){2}.{12}$/D

Try it here.

Expanded version:

/^(?=.{173}\Q\2\)){2}.{11}$\E\/\z)        # Match things near the end.
((?=(.2.|))                               # Capture an empty string or \2\ into group 2.
   \2\/\2\^\2\(\2\?=\2\.\2\{173}\2\\Q\2\\2\2\\\2\)\2\)\2\{2}\2\.
   \2\{11}\2\$\2\\E\2\\\2\/\2\\z\2\)      # 1st line escaped.
   \2\(\2\(\2\?=\2\(\2\.2\2\.\2\|\2\)\2\) # 2nd line escaped.
){2}
.{11}$/x

Extensions like (?= and \1 have made the so-called "regular" expressions no longer regular, which also makes quines possible. Backreference is not regular, but lookahead is.

Explanation:

  • I use \2\ in place of \ to escape special characters. If \2 matches the empty string, \2\x (where x is a special character) matches the x itself. If \2 matches \2\, \2\x matches the escaped one. \2 in the two matches of group 1 can be different in regex. In the first time \2 should match the empty string, and the second time \2\.
  • \Q\2\)){2}.{11}$\E\/\z (line 1) matches 15 characters from the end. And .{11}$ (line 7) matches 11 characters from the end (or before a trailing newline). So the pattern just before the second pattern must match the first 4 or 3 characters in the first pattern, so \2\.\2\|\2\)\2\) must match ...\2\) or ...\2\. There cannot be a trailing newline because the last character should be ). And the matched text doesn't contain another ) before the rightmost one, so all other characters must be in the \2. \2 is defined as (.2.|), so it can only be \2\.
  • The first line makes the whole expression matches exactly 188 characters since everything has a fixed length. The two times of group 1 matches 45*2 characters plus 29 times \2. And things after group 1 matches 11 characters. So the total length of the two times \2 must be exactly 3 characters. Knowing \2 for the second time is 3 characters long, it must be empty for the first time.
  • Everything except the lookahead and \2 are literals in group 1. With the two times \2 known, and the last few characters known from the first line, this regex matches exactly one string.
  • Martin Büttner comes up with the idea of using lookahead to capture group 2 and make it overlap with the quine part. This removed the characters not escaped in the normal way between the two times of group 1, and help avoided the pattern to match them in my original version, and simplified the regex a lot.

Regex without recursions or backreferences, 85 bytes

Someone may argue that expressions with recursions or backreferences are not real "regular" expressions. But expressions with only lookahead can still only match regular languages, although they may be much longer if expressed by traditional regular expressions.

/(?=.*(\QE\\){2}z\/\z)^\/\(\?\=\.\*\(\\Q.{76}\E\\){2}z\/\z)^\/\(\?\=\.\*\(\\Q.{76}\z/

Try it here.

610 bytes without \Q...\E (to be golfed):

/^(?=.{610}$)(?=.{71}(\(\.\{8\}\)\?\\.[^(]*){57}\)\{2\}\.\{12\}\$\/D$)((.{8})?\/(.{8})?\^(.{8})?\((.{8})?\?=(.{8})?\.(.{8})?\{610(.{8})?\}(.{8})?\$(.{8})?\)(.{8})?\((.{8})?\?=(.{8})?\.(.{8})?\{71(.{8})?\}(.{8})?\((.{8})?\\(.{8})?\((.{8})?\\(.{8})?\.(.{8})?\\(.{8})?\{8(.{8})?\\(.{8})?\}(.{8})?\\(.{8})?\)(.{8})?\\(.{8})?\?(.{8})?\\(.{8})?\\(.{8})?\.(.{8})?\[(.{8})?\^(.{8})?\((.{8})?\](.{8})?\*(.{8})?\)(.{8})?\{57(.{8})?\}(.{8})?\\(.{8})?\)(.{8})?\\(.{8})?\{2(.{8})?\\(.{8})?\}(.{8})?\\(.{8})?\.(.{8})?\\(.{8})?\{12(.{8})?\\(.{8})?\}(.{8})?\\(.{8})?\$(.{8})?\\(.{8})?\/D(.{8})?\$(.{8})?\)(.{8})?\(){2}.{12}$/D

Try it here.

The idea is similar.

/^(?=.{610}$)(?=.{71}(\(\.\{8\}\)\?\\.[^(]*){57}\)\{2\}\.\{12\}\$\/D$)
((.{8})?\/(.{8})?\^(.{8})?\((.{8})?\?=(.{8})?\.(.{8})?\{610(.{8})?\}(.{8})?\$(.{8})?\)
(.{8})?\((.{8})?\?=(.{8})?\.(.{8})?\{71(.{8})?\}
  (.{8})?\((.{8})?\\(.{8})?\((.{8})?\\(.{8})?\.(.{8})?\\(.{8})?\{8(.{8})?\\(.{8})?\}
    (.{8})?\\(.{8})?\)(.{8})?\\(.{8})?\?(.{8})?\\(.{8})?\\
    (.{8})?\.(.{8})?\[(.{8})?\^(.{8})?\((.{8})?\](.{8})?\*(.{8})?\)(.{8})?\{57(.{8})?\}
  (.{8})?\\(.{8})?\)(.{8})?\\(.{8})?\{2(.{8})?\\(.{8})?\}
  (.{8})?\\(.{8})?\.(.{8})?\\(.{8})?\{12(.{8})?\\(.{8})?\}
  (.{8})?\\(.{8})?\$(.{8})?\\(.{8})?\/D(.{8})?\$(.{8})?\)(.{8})?\(){2}.{12}$/D

The basic regular expression

If lookahead is not allowed, the best I can do now is:

/\\(\\\(\\\\){2}/

which matches

\\(\\\(\\

If {m,n} quantifier is not allowed, it is impossible because nothing which can only match one string, can match a string longer than itself. Of course one can still invent something like \q which only matches /\q/, and still say expressions with that regular. But apparently nothing like this is supported by major implementations.

jimmy23013

Posted 2014-05-30T16:20:31.400

Reputation: 25 688

38This is the most absurd, incredible thing I've ever seen. – Alex A. – 2016-01-12T19:09:55.730

Is there any reason for \z rather than just $? – primo – 2016-01-16T09:35:55.257

1@primo Martin Büttner pointed out it would also match the regex with a trailing newline if $ is used. An alternative to \z is to use the D flag. – jimmy23013 – 2016-01-16T13:44:25.523

@AlexA. Too bad there isn't Hexagony Regex. – NoOneIsHere – 2016-05-28T14:53:04.087

1This answer redeems code ⛳ with its insanity. Wow. Totally t-shirt worthy – HipsterZipster – 2016-11-06T06:36:12.797

16Someone tweeted this post so I got 49 upvotes in a day... – jimmy23013 – 2016-11-06T08:51:24.367

@jimmy23013 that's why I am here too. – xenteros – 2016-11-07T12:21:11.643

Yay, I'm upvote number 200! – mbomb007 – 2016-11-07T19:59:58.333

@jimmy23013 Im here from that tweet too, and I literally only joined this site to upvote this :p – Olle Kelderman – 2016-11-08T10:28:30.780

Funny thing is that this is the only answer to this question. – Alex M. – 2016-11-11T19:50:30.503

4Hats off to you you crazy regex bastard – Kristopher Ives – 2016-11-11T23:59:15.740

It's on HN now as well! Bravo sir! – Paras Singh Laddi – 2016-11-12T01:02:24.540

1I find that "golfing" the power and availability of the features used is a lot more impressive than golfing the byte count. The version that doesn't use anything more powerful or less widely available than lookaheads is a lot more impressive than the one that uses super-powerful features like (?R) and quirks of the \Q\E implementation, even if the other one is shorter. – user2357112 – 2017-06-13T00:16:25.230

4Impressive. I spent a while trying to get it to match something else, to no success. – primo – 2014-06-16T15:18:30.977

I wasn't able to get a match either. As such I've marked this as the best answer. I will continue to follow the thread and update if a match is found/ a shorter answer discovered – Dylan Madisetti – 2014-06-16T16:28:41.620

2@DylanMadisetti So nobody even wants a proof of why this is correct? – jimmy23013 – 2014-06-16T16:52:09.867

1No you're right, I think a proof is in order. I removed the accepted answer for now. – Dylan Madisetti – 2014-06-16T17:22:53.447

1

@DylanMadisetti Well, I found a bug while writing something like a proof. It also matches this string: http://regex101.com/r/eC0qF2. I'll fix it soon.

– jimmy23013 – 2014-06-16T18:00:02.510

67how (the hell) could an human produce such a thing? – xem – 2014-06-16T18:33:31.273

@xem Find a way to get your editor to highlight \2\, and things can be easier. – jimmy23013 – 2014-06-16T18:43:47.790

@DylanMadisetti Updated and now it has 289 characters. Also added some hints. Not so formal but I think it should be sufficient as a proof. – jimmy23013 – 2014-06-16T20:34:33.933

54This deserves to be the highest voted answer on this site. – Cruncher – 2014-06-16T20:39:20.437

I've deleted the community wiki answer (which is not an answer and not seemed easily fixable). It's about a much shorter regex found here: http://www.nntp.perl.org/group/perl.fwp/2005/03/msg3754.html , which doesn't match the delimiters. One who fixed that should post a new answer.

– jimmy23013 – 2014-11-03T15:13:06.280