How to generally match, unify and merge patterns?

52

30

This question was split from this one. While that question is now about how to match two particular patterns (mostly using Verbatim or HoldPattern), this question is about how to match any pattern with another one, in other words: "How can I test if a given pattern intersects with, or is a subset of another pattern?" (borrowed from Mr.Wizard). Consider the following examples where one can see that the left pattern should "match" the right one, but which all return False:

MatchQ[a|b, b|a]
MatchQ[{a..}, {a..}]
MatchQ[{a..}, {a...}]

Of course one trivial way to deal with Alternatives would be to simply Sort its arguments:

MatchQ[a | b | c, Verbatim[Sort[b | c | a]]]  --->  True

but things get complicated if the pattern involves more than Alternatives.

Problem specification:


Therefore I am looking for a predicate function that compares any pattern with another pattern and decides whether the first one matches the second one. For this, one needs a general way of matching two expressions that may contain any of these operators: {|, .., ..., _, __, ___} and possibly anything else that is specific for patterns. The predicate should have the following behaviour:

PatternMatchQ::usage="PatternMatchQ[e1, e2] returns True
if the set of expressions matched by pattern e1 is the same
or is a subset of the expressions matched by pattern e2, False otherwise."

i.e. the first pattern should cover the same (or smaller) domain as the second does. Now of course if any of the arguments is a non-pattern, PatternMatchQ can fall back to MatchQ.

Examples:


  • a|b agrees with b|a
  • a|b agrees with c|b|a (as a|b covers a domain that is covered by a|b|c as well)
  • a|b|c does not agree with b|a (as a|b|c covers a domain that is larger than the domain covered by b|a unless c == a or c == b, in which case the pattern simplifies to a|b)
  • {a..} agrees with {a..}
  • {a..} agrees with {a...}
  • {a...} does not agree with {a..}


Extension:


Now the concept of matching can be extended in the following way. So far we were only concerned about whether it is true that $domain(e_1) \subseteq domain(e_2)$, and if not, return False. But even if the test gives false, it is possible to define the smallest (most simple) domain that both patterns cover. Also, a full pattern-set-algebra emerges, if one thinks it further. Accordingly, for example PatternUnion could work on any number of arguments, and should return the most general unifier of all the argument patterns. The union of patterns would be equivalent to the merging of patterns.

Now, of course, "The thing that hath been, it is that which shall be; and that which is done is that which shall be done: and there is no new thing under the sun"... in certain fields (e.g. in logic or in linguistics and especially in construction grammar), this method is called unification:

Two expressions $e_1$ and $e_2$ are said to unify iff there exists a substitution $s$ such that $s(e_1) = s(e_2)$

where a substitution is a binding of variables (i.e. implicit, hidden variables introduced by Mathematica to deal with e.g. _, __) in patterns $e_1$ and $e_2$ with actual fitting values. Note that the substituted value of a pattern-variable can be any atom, even symbols, it does not matter. Thus what the code should look for is the equality of variables (i.e. implicit pattern variables). If such an $s$ exists, it is called a unifier. It is known that if two expressions $e_1$ and $e_2$ unify, then there always is only one most general unifier (up to a renaming of variables). Now this general unifier of two pattern expressions can be called (or used to describe) the intersection of $e_1$ and $e_2$.

István Zachar

Posted 2012-03-14T01:13:25.923

Reputation: 44 135

2

@MrWizard: Just for reference, here is a nice finding by @Oleksandr R. in the comments about using Internal`ComparePatterns. I almost forgot about it and had to scan through all my comments to find the source...

– István Zachar – 2012-07-03T15:01:58.097

@IstvánZachar You can make a nice answer with that one – Dr. belisarius – 2012-07-03T15:56:52.490

@belisarius: I am actually waiting for Oleksandr to do it, as it was his discovery. – István Zachar – 2012-07-03T16:11:18.390

@IstvánZachar You might want to remind him again... the last conversation on this was in April and he probably doesn't remember – rm -rf – 2012-07-03T17:42:51.677

@R.M.: I have pinged him in the comment above, wouldn't that be enough? Sadly we cannot send private messages in SO... – István Zachar – 2012-07-03T18:10:26.550

@IstvánZachar I'm actually surprised that you managed to get two @ pings in there... usually the system blocks you from pinging a second person. In any case, even if it worked, it wouldn't have pinged Oleksandr, because he has not been a party to this discussion. (you can't randomly ping users in comments — you can only ping those who have already commented or edited your post) – rm -rf – 2012-07-03T18:48:01.643

@R.M I knew about the one-ping-per-comment thing, but regularly I try it just to test the system, and it seemed like it went through. Though I did not know that only involved users can be pinged. This should be advertised more heavily (thanks for the info). Left a comment at the place referred above for Olex. – István Zachar – 2012-07-03T18:53:07.087

@IstvánZachar usernames are not required to be unique on SE sites. Should you be allowed to ping by username only, you may end up awakening a tide of @ Jim s – Dr. belisarius – 2012-07-04T02:51:18.077

Interesting question. Not sure whether I'd call the name PatternMatchQ a good match for a function that tests on one argument being the subset of another. Anyway, I wonder what uses you have in mind. One thing that I can come up with would be simplification of patterns. The function could be used to find a simpler equivalent of a complicated one. But that would require the function to return True only for exact matches not subsets. – Sjoerd C. de Vries – 2012-03-14T07:15:42.783

Very interesting. Some time ago I did some simple functions to simplify patterns involving Alternatives but not the Blank.. or Repeated families. I think a PatternUnion would be much simpler than a PatternIntersection (which could complement your PatternMatchQ) – FJRA – 2012-03-14T07:59:44.283

Given that Patterns can contain Condition and PatternTest, I'm pretty sure that a general solution would need solving the halting problem. Therefore I guess any solution would need to either exclude those patterns, or need to be allowed to fail. There might be cases not using those constructs which are undecidable as well. – celtschk – 2012-03-23T11:58:30.840

@FJRA: PatternUnion already exists, it is called Alternatives. What you are thinking of is probably something like PatternSimplify[pat1|pat2] (neither Simplify nor FullSimplify seem to simplify patterns). Indeed, a PatternSimplify function would probably need PatternMatchQ functionality (but failures to compute would just cause the original pattern to be retained). Note that also PatternIntersection (without simplification) can be implemented as _?(MatchQ[#,pat1]&&MatchQ[#,pat2]&). – celtschk – 2012-03-23T12:06:27.897

@celtschk Yes, you are right, what I meant was a PatternUnion that simplifies too, simple union and intersection might be done using || (or Alternatives) or && as you showed. But like Union (which merges the list and delete duplicates) that function should simplify too. Interesting problem. – FJRA – 2012-03-23T13:07:29.123

1I just notice that PatternIntersection (without simplification) is even simpler implemented as pat1?(MatchQ[#,pat2]&). – celtschk – 2012-03-23T13:18:46.450

Answers

44

In my opinion, this is a very good and worthwhile question, but certainly not easy to answer. I don't have a full solution by any means, but as far as the comparison/matching part is concerned, the undocumented function Internal`ComparePatterns may be of substantial assistance. What follows is a short summary of what I know about this function, which exists in Mathematica 7 and 8, but not version 5.2. I would guess that it is new-in-6 and used in the implementation of OptionsPattern and related functions.

Internal`ComparePatterns[p, q] (where p and q are patterns) operates somewhat like MatchQ, except that rather than simply True or False to signify agreement or disagreement, multiple (namely, five) possibilities exist to describe the relationship p has to q:

  1. Identity

    Two patterns are considered identical if they match verbatim up to, but not including, naming. This relation should obviously be transitive and commutative, and I haven't observed any counterexamples so far. An example could be:

    Internal`ComparePatterns[a_, b_]
    (* -> "Identical" *)
    

    It is also aware of attributes that affect pattern matching. Here we attempt to mask the Orderless attribute of Plus (and thus possibly confuse Internal`ComparePatterns) by wrapping it in HoldComplete:

    Internal`ComparePatterns[
     HoldComplete[x_Real + y_Integer],
     HoldComplete[y_Integer + x_Real]
    ]
    (* -> "Identical" *)
    

    Pattern names are not completely ignored, however, and seem to be taken into account where appropriate:

    Internal`ComparePatterns[x_Real + y_Integer, x_Integer + y_Real]
    (* -> "Incomparable" *)
    
  2. Equivalence

    If p has the same meaning as q but is not structurally identical, the patterns are considered equivalent:

    Internal`ComparePatterns[a | b, b | a] (* Alternatives is not Orderless *)
    (* -> "Equivalent" *)
    
    Internal`ComparePatterns[a : y_ + x_, b : (f : Plus)[x_, y_]]
    (* -> "Equivalent" *)
    

    However, determination of this relationship is not completely robust. Patterns that are sufficiently structurally different sometimes will not be considered equivalent even if they manifestly are:

    Internal`ComparePatterns[a : y_ + x_, b : (f : Plus | Plus)[x_, y_]]
    (* -> "Specific" *)
    

    Here are two more examples of patterns that are equivalent, but where the relationship is misstated. The second of these is particularly interesting:

    Internal`ComparePatterns[Repeated[_, Infinity], Repeated[_]]
    (* -> "Specific" *)
    
    Internal`ComparePatterns[Repeated[_, {1, Infinity}], Repeated[_, Infinity]]
    (* -> "Identical" *)
    
  3. Specificity

    In some circumstances, Internal`ComparePatterns is able to determine when one pattern is a special case of another:

    Internal`ComparePatterns[_h, _]
    (* -> "Specific" *)
    

    However, this situation is often misdiagnosed with equivalent patterns, which will instead be identified as special cases of each other:

    Internal`ComparePatterns[__, (_) ..]
    (* -> "Specific" *)
    
    Internal`ComparePatterns[(_) .., __]
    (* -> "Specific" *)
    
  4. Disjointness

    What is more reliably stated is when one pattern is exclusive of another, i.e. there are no expressions that could be matched by both:

    Internal`ComparePatterns[_a, _b]
    (* -> "Disjoint" *)
    
  5. Incomparability

    Finally, we have the situation whereby the patterns are either unrelated, or Internal`ComparePatterns simply does not know how to interpret their relationship:

    Internal`ComparePatterns[a | b, b | c]
    (* -> "Incomparable" *)
    

    Notably, it seems to be the case that Internal`ComparePatterns works entirely inside the pattern matcher, so that conditional patterns (which need to invoke the main evaluation loop), if not identical, are generally incomparable (by this mechanism):

    Internal`ComparePatterns[_ /; True, _ /; Sequence[True]]
    (* -> "Incomparable" *)
    
    Internal`ComparePatterns[_?(True &), _ /; True]
    (* -> "Incomparable" *)
    

Now let's try it on the examples:

Internal`ComparePatterns[a | b, b | a]           (* -> "Equivalent" -- correct *)
Internal`ComparePatterns[a | b, c | b | a]       (* -> "Specific" -- correct *)
Internal`ComparePatterns[a | b | c, b | a]       (* -> "Incomparable" -- correct *)
Internal`ComparePatterns[a | b | (c : b), b | a] (* -> "Incomparable" -- incorrect, but: *)
Internal`ComparePatterns[a | (c : b), b | a]     (* -> "Equivalent" -- correct *)
Internal`ComparePatterns[{a ..}, {a ..}]         (* -> "Identical" -- correct *)
Internal`ComparePatterns[{a ..}, {a ...}]        (* -> "Specific" -- correct *)
Internal`ComparePatterns[{a ...}, {a ..}]        (* -> "Incomparable" -- correct *)

So, Internal`ComparePatterns fails only in one case, and its answer is still technically correct as it is the result of the inability of the function to see the relationship between these patterns (Internal`ComparePatterns[a | b | b, b | a] gives "Specific" rather than "Equivalent") and not a statement about the expressions they will match.

I should finish by saying that I wasn't able to find any concrete examples of where Internal`ComparePatterns is actually used in Mathematica, which should give one pause considering its occasional mistakes. However, it may be that I didn't find it because I wasn't trying hard enough, rather than because it isn't used anywhere. Here is code for a hook that can be installed (using $Pre = withHookedComparePatterns) during normal usage. If you're lucky enough to stumble on a function that uses Internal`ComparePatterns, the call stack and the call itself will be printed out at that point, which will help to identify what its use case is, if any. Anyone finding any examples is welcome to edit this answer to include them below (marking as Community Wiki at the same time, if desired).

ClearAll[withHookedComparePatterns];
SetAttributes[withHookedComparePatterns, HoldAll];
Begin["System`Private`"];
withHookedComparePatterns[expr_] :=
  Internal`InheritedBlock[{Internal`ComparePatterns},
   Unprotect[Internal`ComparePatterns];
   cp : Internal`ComparePatterns[___] /;
     StackInhibit[Print[{Stack[], HoldForm[cp]}]; True] := cp;
   Protect[Internal`ComparePatterns];
   StackBegin[expr]
  ];
End[];

Oleksandr R.

Posted 2012-03-14T01:13:25.923

Reputation: 22 073

1Oleksandr, @Mr.Wizard I've just discovered that the order arguments are supplied to ComparePatterns does matter. In some cases swapping argument position yields a sensible result if otherwise it was Incomparable. A new example: f[___] and f[] - try them with both orders. Accordingly, I think ComparePatterns expects arguments to be supplied in the order of specificty: more specific patterns should precede more general patterns. If one thinks of function definition (like f[___]:=f[]) then rhs should precede lhs when comparing them. – István Zachar – 2014-01-08T10:38:37.290

2In V10, there is GeneralUtilities`PatternOrder, which is based on Internal`ComparePatterns. The definition is available (after loading "GeneralUtilities`"). It may be worth exploring. – Michael E2 – 2014-08-27T21:22:19.430

1Beautiful and exhaustive answer. Thanks! – István Zachar – 2012-07-05T09:21:46.970

Well deserved bounty awarded. – Mr.Wizard – 2012-07-11T03:47:41.770

@Mr.Wizard thanks very much! Wasn't sure if you considered this bounty-worthy or not given that it's not a full answer and I'd already somewhat committed to posting this before. Out of interest, do you feel that the outcome of your featuring this question was favorable in terms of getting enough additional attention? It's a hard question for sure, so maybe no further answers are to be expected, but it didn't generate that many extra views either AFAICT. Do most people even look at the featured questions? – Oleksandr R. – 2012-07-11T13:25:14.547

1This is surely worthy of the bounty. I may however offer a second bounty at a later time in search of a supplemental answer. I am pleased with the outcome of the bounty, and it did get attention as the question gained 12 votes during the week. – Mr.Wizard – 2012-07-11T13:41:54.183