PatMat: SPITBOL-like pattern construction and matching in C++

Table of Contents

1 Pattern Matching Tutorial

A pattern matching operation (a call to one of the Match functions) takes a subject string and a pattern, and optionally a replacement string. The replacement string option is only allowed if the subject is a variable.

The pattern is matched against the subject string, and either the match fails, or it succeeds matching a contiguous sub-string. If a replacement string is specified, then the subject string is modified by replacing the matched sub-string with the given replacement.

1.1 Concatenation and Alternation

A pattern consists of a series of pattern elements. The pattern is built up using either the concatenation operator:

A & B

which means match A followed immediately by matching B, or the alternation operator:

A | B

which means first attempt to match A, and then if that does not succeed, match B.

There is full backtracking, which means that if a given pattern element fails to match, then previous alternatives are matched. For example if we have the pattern:

(A | B) & (C | D) & (E | F)

First we attempt to match A, if that succeeds, then we go on to try to match C, and if that succeeds, we go on to try to match E. If E fails, then we try F. If F fails, then we go back and try matching D instead of C. Let's make this explicit using a specific example, and introducing the simplest kind of pattern element, which is a literal string. The meaning of this pattern element is simply to match the characters that correspond to the string characters. Now let's rewrite the above pattern form with specific string literals as the pattern elements:

("ABC" | "AB") & ("DEF" | "CDE") & ("GH" | "IJ")

The following strings will be attempted in sequence:

ABC . DEF . GH
ABC . DEF . IJ
ABC . CDE . GH
ABC . CDE . IJ
AB . DEF . GH
AB . DEF . IJ
AB . CDE . GH
AB . CDE . IJ

Here we use the dot simply to separate the pieces of the string matched by the three separate elements.

1.2 Moving the Start Point

A pattern is not required to match starting at the first character of the string, and is not required to match to the end of the string. The first attempt does indeed attempt to match starting at the first character of the string, trying all the possible alternatives. But if all alternatives fail, then the starting point of the match is moved one character, and all possible alternatives are attempted at the new anchor point.

The entire match fails only when every possible starting point has been attempted. As an example, suppose that we had the subject string

"ABABCDEIJKL"

matched using the pattern in the previous example:

("ABC" | "AB") & ("DEF" | "CDE") & ("GH" | "IJ")

would succeed, after two anchor point moves:

"ABABCDEIJKL"
   ^^^^^^^
   matched
   section

This mode of pattern matching is called the unanchored mode. It is also possible to put the pattern matcher into anchored mode by providing the optional flag Pattern::anchor to the Match function. This will cause the match to be performed in anchored mode, where the match is required to start at the first character.

1.3 Other Pattern Elements

In addition to strings (or single characters), there are many special pattern elements that correspond to special predefined alternations:

1.3.1 Arb

Matches any string. First it matches the null string, and then on a subsequent failure, matches one character, and then two characters, and so on. It only fails if the entire remaining string is matched.

1.3.2 Abort

Immediately aborts the entire pattern match, signalling failure. This is a specialized pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.

1.3.3 Fail

The null alternation. Matches no possible strings, so it always signals failure. This is a specialized pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.

1.3.4 Fence

Matches the null string at first, and then if a failure causes alternatives to be sought, aborts the match (like a Cancel). Note that using Fence at the start of a pattern has the same effect as matching in anchored mode.

1.3.5 Rem

Matches from the current point to the last character in the string. This is a specialized pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.

1.3.6 Succeed

Repeatedly matches the null string. It is equivalent to the alternation

("" | "" | "" ....).

This is a special pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.

1.4 Pattern Construction Functions

The following functions construct additional pattern elements

1.4.1 Any(S)

Where S is a string, matches a single character that is any one of the characters in S. Fails if the current character is not one of the given set of characters.

1.4.2 Arbno(P)

Where P is any pattern, matches any number of instances of the pattern, starting with zero occurrences. It is thus equivalent to

("" | (P & ("" | (P & ("" ....)))).

The pattern P may contain any number of pattern elements including the use of alternatiion and concatenation.

1.4.3 Bal(Open, Close)

Matches a non-empty string that is parentheses balanced with respect to characters Open and Close. Examples of balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E". Bal(Open, Close) matches the shortest possible balanced string on the first attempt, and if there is a subsequent failure, attempts to extend the string.

1.4.4 Break(S)

Where S is a string, matches a string of zero or more characters up to but not including a break character that is one of the characters given in the string S. Can match the null string, but cannot match the last character in the string, since a break character is required to be present.

1.4.5 BreakX(S)

Where S is a string, behaves exactly like Break(S) when it first matches, but if a string is successfully matched, then a susequent failure causes an attempt to extend the matched string.

1.4.6 Fence(P)

Where P is a pattern, attempts to match the pattern P including trying all possible alternatives of P. If none of these alternatives succeeds, then the Fence pattern fails. If one alternative succeeds, then the pattern match proceeds, but on a subsequent failure, no attempt is made to search for alternative matches of P. The pattern P may contain any number of pattern elements including the use of alternatiion and concatenation.

1.4.7 Len(N)

Where N is a natural number, matches the given number of characters. For example, Len(10) matches any string that is exactly 10 characters long.

1.4.8 NotAny(S)

Where S is a string, matches a single character that is not one of the characters of S. Fails if the current characer is one of the given set of characters.

1.4.9 NSpan(S)

Where S is a string, matches a string of zero or more characters that is among the characters given in the string. Always matches the longest possible such string. Always succeeds, since it can match the null string.

1.4.10 Pos(N)

Where N is a natural number, matches the null string if exactly N characters have been matched so far, and otherwise fails.

1.4.11 Rpos(N)

Where N is a natural number, matches the null string if exactly N characters remain to be matched, and otherwise fails.

1.4.12 Rtab(N)

Where N is a natural number, matches characters from the current position until exactly N characters remain to be matched in the string. Fails if fewer than N unmatched characters remain in the string.

1.4.13 Tab(N)

Where N is a natural number, matches characters from the current position until exactly N characters have been matched in all. Fails if more than N characters have already been matched.

1.4.14 Span(S)

Where S is a string, matches a string of one or more characters that is among the characters given in the string. Always matches the longest possible such string. Fails if the current character is not one of the given set of characters.

1.5 Recursive Pattern Matching

The plus operator (+P) where P is a pattern variable, creates a recursive pattern that will, at pattern matching time, follow the pointer to obtain the referenced pattern, and then match this pattern. This may be used to construct recursive patterns. Consider for example:

P = ("A" | ("B" & (+P)))

On the first attempt, this pattern attempts to match the string "A". If this fails, then the alternative matches a "B", followed by an attempt to match P again. This second attempt first attempts to match "A", and so on. The result is a pattern that will match a string of B's followed by a single A.

This particular example could simply be written as (NSpan('B') & 'A'), but the use of recursive patterns in the general case can construct complex patterns which could not otherwise be built.

1.6 Pattern Assignment Operations

In addition to the overall result of a pattern match, which indicates success or failure, it is often useful to be able to keep track of the pieces of the subject string that are matched by individual pattern elements, or subsections of the pattern.

The pattern assignment operators allow this capability. The first form is the immediate assignment:

P % S

Here P is an arbitrary pattern, and S is a variable of type string that will be set to the sub-string matched by P. This assignment happens during pattern matching, so if P matches more than once, then the assignment happens more than once.

The deferred assignment operation:

P * S

avoids these multiple assignments by deferring the assignment to the end of the match. If the entire match is successful, and if the pattern P was part of the successful match, then at the end of the matching operation the assignment to S of the string matching P is performed.

The cursor assignment operation:

Setcur(N)

assigns the current cursor position to the natural variable N. The cursor position is defined as the count of characters that have been matched so far (including any start point moves).

Finally the operations % and * may be used with values of type ostream. The effect is to do a << operation of the matched sub-string. These are particularly useful in debugging pattern matches.

1.7 Deferred Matching

The pattern construction functions (such as Len and Any) all permit the use of pointers to natural or string values, or functions that return natural or string values. These forms cause the actual value to be obtained at pattern matching time. This allows interesting possibilities for constructing dynamic patterns as illustrated in the examples section.

In addition the (+S) operator may be used where S is a pointer to string or function returning string, with a similar deferred effect.

A special use of deferred matching is the construction of predicate functions. The element (+P) where P is an access to a function that returns a bool value, causes the function to be called at the time the element is matched. If the function returns true, then the null string is matched, if the function returns false, then failure is signalled and previous alternatives are sought.

1.8 Deferred Replacement

Matching of a non-const string returns as MutableMatchState and then a subsequent assignment operation to this object performs the required replacement.

Using this approach, we can write:

string c;
string s;
('(' & Len(1) % c & ')')(s) = '[' + c + ']';

An assignment after a failed match has no effect. Note that string s should not be modified between the calls as it stores the start and end of the matched sub-string.

1.9 Examples of Pattern Matching

First a simple example of the use of pattern replacement to remove a line number from the start of a string. We assume that the line number has the form of a string of decimal digits followed by a period, followed by one or more spaces.

const Pattern digs = Span("0123456789");
const Pattern lNum = Pos(0U) & digs & '.' & Span(' ');

Now to use this pattern we simply do a match with a replacement:

lNum(line) = "";

which replaces the line number by the null string. Note that it is also possible to use an CharacterSet value as an argument to Span and similar functions, and in particular all the useful constants 'in CharacterSet::CharacterSets are available. This means that we could define Digs as:

const Pattern digs = Span(CharacterSets::digit);

The style we use here, of defining constant patterns and then using them is typical. It is possible to build up patterns dynamically, but it is usually more efficient to build them in pieces in advance using constant declarations. Note in particular that although it is possible to construct a pattern directly as an argument for the Pattern(string matching operator, it is much more efficient to preconstruct the pattern as we did in this example.

Now let's look at the use of pattern assignment to break a string into sections. Suppose that the input string has two Natural decimal integers, separated by spaces or a comma, with spaces allowed anywhere. Then we can isolate the two numbers with the following pattern:

string num1, num2;
const Pattern blank = NSpan(' ');
const Pattern num = Span("0123456789");
const Pattern nums = blank & num % num1 & Span(" ,") & num % num2;
nums(" 124, 257 ");

The match operation nums(" 124, 257 ") would assign the string 124 to num1 and the string 257 to num2.

Now let's see how more complex elements can be built from the set of primitive elements. The following pattern matches strings that have the syntax of Ada 95 literals:

const Pattern digs  = Span(CharacterSets::digit);
const Pattern uDigs = digs & Arbno('_' & digs);

const Pattern eDig  = Span(CharacterSets::xdigit);
const Pattern ueDdig = eDig & Arbno('_' & eDig);

const Pattern bNum  = uDigs & '#' & ueDig & '#';

A match against bNum will now match the desired strings, e.g. it will match 16#123_abc#, but not a#b#. However, this pattern is not quite complete, since it does not allow colons to replace the pound signs. The following is more complete:

const Pattern bChar = Any("#:");
const Pattern bNum  = uDigs & bChar & ueDig & bChar;

but that is still not quite right, since it allows # and : to be mixed, and they are supposed to be used consistently. We solve this by using a deferred match.

string temp;
const Pattern bNum = uDigs & bChar % temp & ueDdig & (+temp)

Here the first instance of the base character is stored in temp, and then later in the pattern we rematch the value that was assigned.

For an example of a recursive pattern, let's define a pattern that is like the built in Bal(Open, Close), but the string matched is balanced with respect to square brackets OR curly brackets.

The language for such strings might be defined in extended BNF as

ELEMENT ::= <any character other than [] or {}>
           | '[' BALANCED_STRING ']'
           | '{' BALANCED_STRING '}'

BALANCED_STRING ::= ELEMENT {ELEMENT}

Here we use {} to indicate zero or more occurrences of a term, as is common practice in extended BNF. Now we can translate the above BNF into recursive patterns as follows:

Pattern balancedString;

Pattern element =
    NotAny("[]{}")
  | ('[' & (+balancedString) & ']')
  | ('{' & (+balancedString) & '}');

balancedString = element & Arbno(element);

Note the important use of + here to refer to a pattern not yet defined. Note also that we use assignments precisely because we cannot refer to as yet undeclared variables in initializations.

Now that this pattern is constructed, we can use it as though it were a new primitive pattern element, and for example, the match:

(balancedString % cout & Fail())("xy[ab{cd}]");

will generate the output:

x
xy
xy[ab{cd}]
y
y[ab{cd}]
[ab{cd}]
a
ab
ab{cd}
b
b{cd}
{cd}
c
cd
d

Note that the function of the fail here is simply to force the pattern Balanced_String to match all possible alternatives. Studying the operation of this pattern in detail is highly instructive.

Finally we give a rather elaborate example of the use of deferred matching. The following declarations build up a pattern which will find the longest string of decimal digits in the subject string.

class MaxLen
:
    public BoolGetter
{
    const string& cur_;
    const string& max_;

    public:

        MaxLen(const string& cur, const string& max)
        :
            cur_(cur),
            max_(max)
        {}

        bool get()
        {
            return cur_.size() > max_.size();
        }
};

.
.
.

string cur, max;
Natural loc;

MaxLen GtS(cur, max);

const CharacterSet& digit = CharacterSets::digit;
const Pattern digits = Span(digit);

const Pattern find =
    "" % max & Fence()         & // initialize max to null
    BreakX(digit)              & // scan looking for digits
    ((digits % cur             & // assign next string to cur
     (+GtS)                    & // check cur.size() > max.size()
     Setcur(loc))                // if so, save location
             % max)            & // and assign to max
    Fail();                      // seek all alternatives

As we see from the comments here, complex patterns like this take on aspects of sequential programs. In fact they are sequential programs with general backtracking. In this pattern, we first use a pattern assignment that matches null and assigns it to max, so that it is initialized for the new match. Now BreakX scans to the next digit. Arb would do here, but BreakX will be more efficient. Once we have found a digit, we scan out the longest string of digits with Span, and assign it to cur. The deferred call to GtS.get() tests if the string we assigned to cur is the longest so far. If not, then failure is signalled, and we seek alternatives (this means that BreakX will extend and look for the next digit string). If the call to GtS.get() succeeds then the matched string is assigned as the largest string so far into max and its location is saved in loc. Finally Fail forces the match to fail and seek alternatives, so that the entire string is searched.

If the pattern find is matched against a string, the variable max at the end of the pattern will have the longest string of digits, and loc will be the starting character location of the string. For example, find("ab123cd4657ef23") will assign "4657" to max and 11 to loc (indicating that the string ends with the eleventh character of the string).

1.10 Correspondence with Pattern Matching in SPITBOL

Generally the C++ syntax and names correspond closely to SPITBOL syntax for pattern matching construction.

The basic pattern construction operators are renamed as follows:

SPITBOL     C++
(space)      &
   |         | (Or for Character)
   $         %
   .         *
   *         +

The C++ operators were chosen so that the relative precedences of these operators corresponds to that of the SPITBOL operators, but as always, the use of parentheses is advisable to clarify.

The pattern construction operators all have similar names.

The actual pattern matching syntax is modified in C++ as follows:

SPITBOL      C++
x y          y(x);
x y = repl   y(x) = repl;

and pattern failure is indicated by returning a bool result from the Match function (true for success, false for failure).

2 Example Code

#include <string>
#include <iostream>

#include "Pattern.H"

using namespace PatMat;
using namespace std;


class MaxLen
:
    public BoolGetter
{
    const string& cur_;
    const string& max_;

    public:

        MaxLen(const string& cur, const string& max)
        :
            cur_(cur),
            max_(max)
        {}

        bool get() const
        {
            return cur_.size() > max_.size();
        }
};


int main()
{
    {
        string s("Change brackets around a character (c)");
        string c;
        ('(' & Len(1) % c & ')')(s) = '[' + c + ']';
        cout << s << endl;
    }
    {
        const Pattern digs = Span(CharacterSets::digit);
        const Pattern lNum = Pos(0U) & digs & '.' & Span(' ');
        string line("258. Words etc.");
        lNum(line) = "";
        cout << line << endl;
    }
    {
        string num1, num2;
        const Pattern blank = NSpan(' ');
        const Pattern num = Span("0123456789");
        const Pattern nums = blank & num % num1 & Span(" ,") & num % num2;
        nums(" 124, 257 ");
        cout << "num1 = " << num1 << "; num2 = " << num2 << endl;
    }
    {
        const Pattern digs  = Span(CharacterSets::digit);
        const Pattern uDigs = digs & Arbno('_' & digs);
        const Pattern eDig  = Span(CharacterSets::xdigit);
        const Pattern ueDig = eDig & Arbno('_' & eDig);
        const Pattern bChar = Any("#:");
        string temp;
        const Pattern bNum = uDigs & bChar % temp & ueDig & (+temp);
        const string subject("16#123_abc#");
        if (bNum(subject)) cout << "Matched " << subject << endl;
    }
    {
        Pattern balancedString;

        Pattern element =
            NotAny("[]{}")
          | ('[' & (+balancedString) & ']')
          | ('{' & (+balancedString) & '}');

        balancedString = element & Arbno(element);

        cout << (balancedString % output & Fail())("xy[ab{cd}]") << endl;
    }
    {
        string cur, max;
        Natural loc;

        MaxLen GtS(cur, max);

        const CharacterSet& digit = CharacterSets::digit;
        const Pattern digits = Span(digit);

        const Pattern find =
            "" % max & Fence()  &   // initialize max to null
            BreakX(digit)       &   // scan looking for digits
            ((digits % cur      &   // assign next string to cur
              +GtS              &   // check cur.size() > max.size()
             Setcur(loc))           // if so, save location
                     % max)     &   // and assign to max
            Fail();                 // seek all alternatives

        find("ab123cd4657ef23");
        cout<< "max = " << max << "; loc = " << loc << endl;
    }

    return 0;
}

3 Pattern Functions and Operators

// ----------------------------------------------------------------------------
/// Pattern functions and operators
// ----------------------------------------------------------------------------

// ----------------------------------------------------------------------------
///  Abort
// ----------------------------------------------------------------------------
// Constructs a pattern that immediately aborts the entire match
Pattern Abort();

// ----------------------------------------------------------------------------
///  Alternation
// ----------------------------------------------------------------------------
// Creates a pattern that will first try to match l and then on a subsequent
// failure, attempts to match r instead.
// Matches l followed by r
// SPITBOL: binary "|"
// C++: binary "|"

Pattern operator|(const std::string& l, const Pattern& r);
Pattern operator|(const Character *l, const Pattern& r);
Pattern operator|(const Pattern& l, const std::string& r);
Pattern operator|(const Pattern& l, const Character *r);
Pattern operator|(const std::string& l, const std::string& r);
Pattern operator|(const Character *l, const std::string& r);
Pattern operator|(const std::string& l, const Character *r);
Pattern Or(const Character *l, const Character *r);
Pattern operator|(const Pattern& l, const Pattern& r);
Pattern operator|(const Character l, const Pattern& r);
Pattern operator|(const Pattern& l, const Character r);
Pattern operator|(const std::string& l, const Character r);
Pattern Or(const Character *l, const Character r);
Pattern operator|(const Character l, const std::string& r);
Pattern Or(const Character l, const Character *r);

// ----------------------------------------------------------------------------
/// Any
// ----------------------------------------------------------------------------
// Constructs a pattern that matches a single character that is one of the
// characters in the given argument. The pattern fails if the current character
// is not in str.

Pattern Any(const Character c);
Pattern Any(const CharacterSet& set);
Pattern Any(const std::string& str);
Pattern Any(const std::string *str);
Pattern Any(const StringGetter&);

// ----------------------------------------------------------------------------
/// Arb
// ----------------------------------------------------------------------------
//
// Constructs a pattern that will match any string. On the first attempt, the
// pattern matches a null string, then on each successive failure, it matches
// one more character, and only fails if matching the entire rest of the string.

Pattern Arb();

// ----------------------------------------------------------------------------
///  Arbno
// ----------------------------------------------------------------------------
//
// Pattern repetition. First matches null, then on a subsequent failure attempts
// to match an additional instance of the given pattern.  Equivalent to (but
// more efficient than) P & ("" | (P & ("" | ...

Pattern Arbno(const Character c);
Pattern Arbno(const std::string& str);
Pattern Arbno(const Character *str);
Pattern Arbno(const Pattern& p);

// ----------------------------------------------------------------------------
///  Assignment immediately
// ----------------------------------------------------------------------------
//
// Matches P, and if the match succeeds, assigns the matched sub-string to the
// given std::string variable S. This assignment happens as soon as the
// sub-string is matched, and if the pattern P1 is matched more than once during
// the course of the match, then the assignment will occur more than once.
//
// SPITBOL: binary "$"
// C++: binary "%"

Pattern operator%(const Pattern& p, std::string& var);
Pattern operator%(const Pattern& p, StringGetter& obj);

// ----------------------------------------------------------------------------
///  Assignment on match
// ----------------------------------------------------------------------------
//
// Like "%" above, except that the assignment happens at most once after the
// entire match is completed successfully. If the match fails, then no
// assignment takes place.
//
// SPITBOL: binary "."
// C++: binary "*"

Pattern operator*(const Pattern& p, std::string& var);
Pattern operator*(const Pattern& p, StringGetter& obj);

// ----------------------------------------------------------------------------
///  Bal
// ----------------------------------------------------------------------------
//
// Constructs a pattern that will match any non-empty string that is parentheses
// balanced with respect to the parentheses characters open and close.
// Attempts to extend the string if a subsequent failure occurs.
Pattern Bal(const Character open, const Character close);

// ----------------------------------------------------------------------------
///  Break
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches a (possibly null) string which is
// immediately followed by a character in the given argument. This character is
// not part of the matched string. The pattern fails if the remaining characters
// to be matched do not include any of the characters in str.
Pattern Break(const Character c);
Pattern Break(const CharacterSet& set);
Pattern Break(const std::string& str);
Pattern Break(const std::string *str);
Pattern Break(const StringGetter&);

// ----------------------------------------------------------------------------
///  BreakX
// ----------------------------------------------------------------------------
//
// Like Break, but the pattern attempts to extend on a failure to find the next
// occurrence of a character in str, and only fails when the last such instance
// causes a failure.
Pattern BreakX(const Character c);
Pattern BreakX(const CharacterSet& set);
Pattern BreakX(const std::string& str);
Pattern BreakX(const std::string *str);
Pattern BreakX(const StringGetter&);

// ----------------------------------------------------------------------------
///  Concatenation operators
// ----------------------------------------------------------------------------
//
// Matches L followed by r
// SPITBOL: binary " " operator
// C++: binary "&" operator

Pattern operator&(const std::string& l, const Pattern& r);
Pattern operator&(const Pattern& l, const std::string& r);
Pattern operator&(const Pattern& l, const Pattern& r);
Pattern operator&(const Character l, const Pattern& r);
Pattern operator&(const Pattern& l, const Character r);

// ----------------------------------------------------------------------------
///  Deferred Matching
// ----------------------------------------------------------------------------
// SPITBOL: unary "*"
// C++: unary "+"

//- This function constructs a pattern which at pattern matching time will
//  access the current value of this variable, and match against the pattern
//  value.
//
//  Here p must be a Pattern variable.
//
//  DANGEROUS if Pattern lifetime longer than referenced variable!!
Pattern Defer(const Pattern& p);

//- This function constructs a pattern which at pattern matching time will
//  access the current value of this variable, and match against these
//  characters.
//
//  Here str must be a std::string variable.
Pattern Defer(const std::string& str);

//- Constructs a pattern which at pattern matching time calls the given
//  function, and then matches against the string or character value that is
//  returned by the call.
Pattern Defer(const StringGetter& obj);

//- Constructs a predicate pattern function that at pattern matching time calls
//  the given function. If True is returned, then the pattern matches.  If False
//  is returned, then failure is signalled.
Pattern Defer(const BoolGetter& obj);

inline Pattern operator+(const Pattern& p)
{
    return Defer(p);
}

inline Pattern operator+(const std::string& s)
{
    return Defer(s);
}

inline Pattern operator+(const StringGetter& obj)
{
    return Defer(obj);
}

inline Pattern operator+(const BoolGetter& obj)
{
    return Defer(obj);
}

// ----------------------------------------------------------------------------
///  Fail
// ----------------------------------------------------------------------------
// Constructs a pattern that always fails
Pattern Fail();

// ----------------------------------------------------------------------------
///  Fence
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches null on the first attempt, and then causes
// the entire match to be aborted if a subsequent failure occurs.
Pattern Fence();

// Constructs a pattern that first matches P. if P fails, then the constructed
// pattern fails. If P succeeds, then the match proceeds, but if subsequent
// failure occurs, alternatives in P are not sought.  The idea of Fence is that
// each time the pattern is matched, just one attempt is made to match P,
// without trying alternatives.
Pattern Fence(const Pattern& p);

// ----------------------------------------------------------------------------
///  Len
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches exactly the given number of characters. The
// pattern fails if fewer than this number of characters remain to be matched in
// the string.
Pattern Len(const Natural count);
Pattern Len(const UnsignedGetter& count);
Pattern Len(const Natural *count);

// ----------------------------------------------------------------------------
///  NotAny
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches a single character that is not one of the
// characters in the given argument. The pattern Fails if the current character
// is in str.

Pattern NotAny(const Character c);
Pattern NotAny(const CharacterSet& set);
Pattern NotAny(const std::string& str);
Pattern NotAny(const std::string* str);
Pattern NotAny(const StringGetter&);

// ----------------------------------------------------------------------------
///  NSpan
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches the longest possible string consisting
// entirely of characters from the given argument. The string may be empty, so
// this pattern always succeeds.

// Null or Span (always succeeds)
// [NOT in SPITBOL]

Pattern NSpan(const Character c);
Pattern NSpan(const CharacterSet& set);
Pattern NSpan(const std::string& str);
Pattern NSpan(const std::string *str);
Pattern NSpan(const StringGetter&);

// ----------------------------------------------------------------------------
///  Pos
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches the null string if exactly count characters
// have already been matched, and otherwise fails.

Pattern Pos(Natural count);
Pattern Pos(const UnsignedGetter&);
Pattern Pos(const Natural *ptr);

// ----------------------------------------------------------------------------
///  Rem
// ----------------------------------------------------------------------------
//
// Constructs a pattern that always succeeds, matching the remaining unmatched
// characters in the pattern.

// SPITBOL: REM
// C++: Rem

Pattern Rem();

// ----------------------------------------------------------------------------
///  Rpos
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches the null string if exactly count characters
// remain to be matched in the string, and otherwise fails.

Pattern Rpos(Natural count);
Pattern Rpos(const UnsignedGetter&);
Pattern Rpos(const Natural *ptr);

// ----------------------------------------------------------------------------
///  Rtab
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches from the current location until exactly
// count characters remain to be matched in the string. The pattern fails if
// fewer than count characters remain to be matched.

Pattern Rtab(Natural count);
Pattern Rtab(const UnsignedGetter&);
Pattern Rtab(const Natural *ptr);

// ----------------------------------------------------------------------------
///  Setcur
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches the null string, and assigns the current
// cursor position in the string to var. This value is the number of
// characters matched so far. So it is zero at the start of the match.
//
// SPITBOL: unary "@"
// C++: Setcur

Pattern Setcur(Natural &var);

// ----------------------------------------------------------------------------
///  Span
// ----------------------------------------------------------------------------
//
// Constructs a pattern that matches the longest possible string consisting
// entirely of characters from the given argument. The string cannot be empty ,
// so the pattern fails if the current character is not one of the characters in
// str.

Pattern Span(const Character c);
Pattern Span(const CharacterSet& set);
Pattern Span(const std::string& str);
Pattern Span(const std::string *str);
Pattern Span(const StringGetter&);

// ----------------------------------------------------------------------------
///  Succeed
// ----------------------------------------------------------------------------
//
// Constructs a pattern that succeeds matching null, both on the first attempt,
// and on any rematch attempt, i.e. it is equivalent to an infinite alternation
// of null strings.

Pattern Succeed();

// ----------------------------------------------------------------------------
///  Tab
// ----------------------------------------------------------------------------
//
// Constructs a pattern that from the current location until count characters
// have been matched. The pattern fails if more than count characters have
// already been matched.

Pattern Tab(const Natural count);
Pattern Tab(const UnsignedGetter&);
Pattern Tab(const Natural *ptr);

// ----------------------------------------------------------------------------
/// Pattern Matching Operations
// ----------------------------------------------------------------------------
//
// The Match function performs an actual pattern matching operation.  The
// versions with two parameters perform a match without modifying the subject
// string and return a bool result indicating if the match is successful or
// not.
//
// Note that pattern assignment functions in the pattern may generate side
// effects, so these functions are not necessarily pure.
//
// Pattern::anchor
//
//   This flag can be set to cause all subsequent pattern matches to operate in
//   anchored mode. In anchored mode, no attempt is made to move the anchor
//   point, so that if the match succeeds it must succeed starting at the first
//   character. Note that the effect of anchored mode may be achieved in
//   individual pattern matches by using Fence or Pos(0) at the start of the
//   pattern.
//
//   In an unanchored match, which is the default, successive attempts are made
//   to match the given pattern at each character of the subject string until a
//   match succeeds, or until all possibilities have failed.

// ----------------------------------------------------------------------------
/// Debugging Routines
// ----------------------------------------------------------------------------
//
// Debugging pattern matching operations can often be quite complex, since there
// is no obvious way to trace the progress of the match.  The declarations in
// this section provide some debugging assistance.

// ----------------------------------------------------------------------------
///  Pattern::debug flag
// ----------------------------------------------------------------------------
//
// The Pattern::debug flag can be provided to the Match functions to generate
// debugging information. The debugging output is a full trace of the actions of
// the pattern matcher, written to cout. The level of this information is
// intended to be comprehensible at the abstract level of this package
// declaration. However, note that the use of this switch often generates large
// amounts of output.

// ----------------------------------------------------------------------------
///  Write pattern to std::ostream
// ----------------------------------------------------------------------------
//
// This output operator writes a string representation of the pattern that
// corresponds to the syntax needed to create the given pattern using the
// functions in this package. The form of this string is such that it could
// actually be compiled and evaluated to yield the required pattern except for
// references to variables and functions.
std::ostream& operator<<(std::ostream &, const Pattern& p);

// ----------------------------------------------------------------------------
///  Pattern::debugMsg
// ----------------------------------------------------------------------------
inline void PatMat::Pattern::debugMsg(const Character* fmt) const
{
    IDOUT(std::cerr<< fmt << long(pat_) << std::endl;);
}

// -----------------------------------------------------------------------------

Created: 2016-08-23 Tue 22:01

Emacs 24.5 (Org mode 8.2.10)

Validate