Documentation and tests update and minor tweak to perltest.sh.
This commit is contained in:
parent
bf9c64b444
commit
8bc4dee087
@ -3227,13 +3227,13 @@ Verbs that act after backtracking
|
||||
</b><br>
|
||||
<P>
|
||||
The following verbs do nothing when they are encountered. Matching continues
|
||||
with what follows, but if there is no subsequent match, causing a backtrack to
|
||||
the verb, a failure is forced. That is, backtracking cannot pass to the left of
|
||||
the verb. However, when one of these verbs appears inside an atomic group or in
|
||||
an assertion that is true, its effect is confined to that group, because once
|
||||
the group has been matched, there is never any backtracking into it. In this
|
||||
situation, backtracking has to jump to the left of the entire atomic group or
|
||||
assertion.
|
||||
with what follows, but if there is a subsequent match failure, causing a
|
||||
backtrack to the verb, a failure is forced. That is, backtracking cannot pass
|
||||
to the left of the verb. However, when one of these verbs appears inside an
|
||||
atomic group or in a lookaround assertion that is true, its effect is confined
|
||||
to that group, because once the group has been matched, there is never any
|
||||
backtracking into it. Backtracking from beyond an assertion or an atomic group
|
||||
ignores the entire group, and seeks a preceeding backtracking point.
|
||||
</P>
|
||||
<P>
|
||||
These verbs differ in exactly what kind of failure occurs when backtracking
|
||||
@ -3321,12 +3321,37 @@ instead of skipping on to "c".
|
||||
<pre>
|
||||
(*SKIP:NAME)
|
||||
</pre>
|
||||
When (*SKIP) has an associated name, its behaviour is modified. When it is
|
||||
triggered, the previous path through the pattern is searched for the most
|
||||
recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
|
||||
is to the subject position that corresponds to that (*MARK) instead of to where
|
||||
(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
|
||||
(*SKIP) is ignored.
|
||||
When (*SKIP) has an associated name, its behaviour is modified. When such a
|
||||
(*SKIP) is triggered, the previous path through the pattern is searched for the
|
||||
most recent (*MARK) that has the same name. If one is found, the "bumpalong"
|
||||
advance is to the subject position that corresponds to that (*MARK) instead of
|
||||
to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
|
||||
the (*SKIP) is ignored.
|
||||
</P>
|
||||
<P>
|
||||
The search for a (*MARK) name uses the normal backtracking mechanism, which
|
||||
means that it does not see (*MARK) settings that are inside atomic groups or
|
||||
assertions, because they are never re-entered by backtracking. Compare the
|
||||
following <b>pcre2test</b> examples:
|
||||
<pre>
|
||||
re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
|
||||
data: abc
|
||||
0: a
|
||||
1: a
|
||||
data:
|
||||
re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
|
||||
data: abc
|
||||
0: b
|
||||
1: b
|
||||
</pre>
|
||||
In the first example, the (*MARK) setting is in an atomic group, so it is not
|
||||
seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
|
||||
the second branch of the pattern to be tried at the first character position.
|
||||
In the second example, the (*MARK) setting is not in an atomic group. This
|
||||
allows (*SKIP:X) to immediately cause a new matching attempt to start at the
|
||||
second character. This time, the (*MARK) is never seen because "a" does not
|
||||
match "b", so the matcher immediately jumps to the second branch of the
|
||||
pattern.
|
||||
</P>
|
||||
<P>
|
||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||
@ -3456,6 +3481,14 @@ a positive assertion and false for a negative one; captured substrings are
|
||||
retained in both cases.
|
||||
</P>
|
||||
<P>
|
||||
The remaining verbs act only when a later failure causes a backtrack to
|
||||
reach them. This means that their effect is confined to the assertion,
|
||||
because lookaround assertions are atomic. A backtrack that occurs after an
|
||||
assertion is complete does not jump back into the assertion. Note in particular
|
||||
that a (*MARK) name that is set in an assertion is not "seen" by an instance of
|
||||
(*SKIP:NAME) latter in the pattern.
|
||||
</P>
|
||||
<P>
|
||||
The effect of (*THEN) is not allowed to escape beyond an assertion. If there
|
||||
are no more branches to try, (*THEN) causes a positive assertion to be false,
|
||||
and a negative assertion to be true.
|
||||
@ -3463,10 +3496,10 @@ and a negative assertion to be true.
|
||||
<P>
|
||||
The other backtracking verbs are not treated specially if they appear in a
|
||||
standalone positive assertion. In a conditional positive assertion,
|
||||
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be
|
||||
false. However, for both standalone and conditional negative assertions,
|
||||
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be
|
||||
true, without considering any further alternative branches.
|
||||
backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
|
||||
causes the condition to be false. However, for both standalone and conditional
|
||||
negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
|
||||
the assertion to be true, without considering any further alternative branches.
|
||||
<a name="btsub"></a></P>
|
||||
<br><b>
|
||||
Backtracking verbs in subroutines
|
||||
@ -3509,7 +3542,7 @@ Cambridge, England.
|
||||
</P>
|
||||
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
|
||||
<P>
|
||||
Last updated: 10 July 2018
|
||||
Last updated: 11 July 2018
|
||||
<br>
|
||||
Copyright © 1997-2018 University of Cambridge.
|
||||
<br>
|
||||
|
@ -8695,14 +8695,14 @@ BACKTRACKING CONTROL
|
||||
Verbs that act after backtracking
|
||||
|
||||
The following verbs do nothing when they are encountered. Matching con-
|
||||
tinues with what follows, but if there is no subsequent match, causing
|
||||
a backtrack to the verb, a failure is forced. That is, backtracking
|
||||
cannot pass to the left of the verb. However, when one of these verbs
|
||||
appears inside an atomic group or in an assertion that is true, its
|
||||
effect is confined to that group, because once the group has been
|
||||
matched, there is never any backtracking into it. In this situation,
|
||||
backtracking has to jump to the left of the entire atomic group or
|
||||
assertion.
|
||||
tinues with what follows, but if there is a subsequent match failure,
|
||||
causing a backtrack to the verb, a failure is forced. That is, back-
|
||||
tracking cannot pass to the left of the verb. However, when one of
|
||||
these verbs appears inside an atomic group or in a lookaround assertion
|
||||
that is true, its effect is confined to that group, because once the
|
||||
group has been matched, there is never any backtracking into it. Back-
|
||||
tracking from beyond an assertion or an atomic group ignores the entire
|
||||
group, and seeks a preceeding backtracking point.
|
||||
|
||||
These verbs differ in exactly what kind of failure occurs when back-
|
||||
tracking reaches them. The behaviour described below is what happens
|
||||
@ -8790,12 +8790,36 @@ BACKTRACKING CONTROL
|
||||
|
||||
(*SKIP:NAME)
|
||||
|
||||
When (*SKIP) has an associated name, its behaviour is modified. When it
|
||||
is triggered, the previous path through the pattern is searched for the
|
||||
most recent (*MARK) that has the same name. If one is found, the
|
||||
"bumpalong" advance is to the subject position that corresponds to that
|
||||
(*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with
|
||||
a matching name is found, the (*SKIP) is ignored.
|
||||
When (*SKIP) has an associated name, its behaviour is modified. When
|
||||
such a (*SKIP) is triggered, the previous path through the pattern is
|
||||
searched for the most recent (*MARK) that has the same name. If one is
|
||||
found, the "bumpalong" advance is to the subject position that corre-
|
||||
sponds to that (*MARK) instead of to where (*SKIP) was encountered. If
|
||||
no (*MARK) with a matching name is found, the (*SKIP) is ignored.
|
||||
|
||||
The search for a (*MARK) name uses the normal backtracking mechanism,
|
||||
which means that it does not see (*MARK) settings that are inside
|
||||
atomic groups or assertions, because they are never re-entered by back-
|
||||
tracking. Compare the following pcre2test examples:
|
||||
|
||||
re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
|
||||
data: abc
|
||||
0: a
|
||||
1: a
|
||||
data:
|
||||
re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
|
||||
data: abc
|
||||
0: b
|
||||
1: b
|
||||
|
||||
In the first example, the (*MARK) setting is in an atomic group, so it
|
||||
is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored.
|
||||
This allows the second branch of the pattern to be tried at the first
|
||||
character position. In the second example, the (*MARK) setting is not
|
||||
in an atomic group. This allows (*SKIP:X) to immediately cause a new
|
||||
matching attempt to start at the second character. This time, the
|
||||
(*MARK) is never seen because "a" does not match "b", so the matcher
|
||||
immediately jumps to the second branch of the pattern.
|
||||
|
||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It
|
||||
ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME).
|
||||
@ -8915,41 +8939,48 @@ BACKTRACKING CONTROL
|
||||
true for a positive assertion and false for a negative one; captured
|
||||
substrings are retained in both cases.
|
||||
|
||||
The effect of (*THEN) is not allowed to escape beyond an assertion. If
|
||||
there are no more branches to try, (*THEN) causes a positive assertion
|
||||
The remaining verbs act only when a later failure causes a backtrack to
|
||||
reach them. This means that their effect is confined to the assertion,
|
||||
because lookaround assertions are atomic. A backtrack that occurs after
|
||||
an assertion is complete does not jump back into the assertion. Note in
|
||||
particular that a (*MARK) name that is set in an assertion is not
|
||||
"seen" by an instance of (*SKIP:NAME) latter in the pattern.
|
||||
|
||||
The effect of (*THEN) is not allowed to escape beyond an assertion. If
|
||||
there are no more branches to try, (*THEN) causes a positive assertion
|
||||
to be false, and a negative assertion to be true.
|
||||
|
||||
The other backtracking verbs are not treated specially if they appear
|
||||
in a standalone positive assertion. In a conditional positive asser-
|
||||
tion, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the con-
|
||||
dition to be false. However, for both standalone and conditional nega-
|
||||
tive assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE)
|
||||
causes the assertion to be true, without considering any further alter-
|
||||
native branches.
|
||||
The other backtracking verbs are not treated specially if they appear
|
||||
in a standalone positive assertion. In a conditional positive asser-
|
||||
tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP),
|
||||
or (*PRUNE) causes the condition to be false. However, for both stand-
|
||||
alone and conditional negative assertions, backtracking into (*COMMIT),
|
||||
(*SKIP), or (*PRUNE) causes the assertion to be true, without consider-
|
||||
ing any further alternative branches.
|
||||
|
||||
Backtracking verbs in subroutines
|
||||
|
||||
These behaviours occur whether or not the subpattern is called recur-
|
||||
These behaviours occur whether or not the subpattern is called recur-
|
||||
sively. Perl's treatment of subroutines is different in some cases.
|
||||
|
||||
(*FAIL) in a subpattern called as a subroutine has its normal effect:
|
||||
(*FAIL) in a subpattern called as a subroutine has its normal effect:
|
||||
it forces an immediate backtrack.
|
||||
|
||||
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine
|
||||
match to succeed without any further processing. Matching then contin-
|
||||
(*ACCEPT) in a subpattern called as a subroutine causes the subroutine
|
||||
match to succeed without any further processing. Matching then contin-
|
||||
ues after the subroutine call.
|
||||
|
||||
(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine
|
||||
cause the subroutine match to fail.
|
||||
|
||||
(*THEN) skips to the next alternative in the innermost enclosing group
|
||||
within the subpattern that has alternatives. If there is no such group
|
||||
(*THEN) skips to the next alternative in the innermost enclosing group
|
||||
within the subpattern that has alternatives. If there is no such group
|
||||
within the subpattern, (*THEN) causes the subroutine match to fail.
|
||||
|
||||
|
||||
SEE ALSO
|
||||
|
||||
pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3),
|
||||
pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3),
|
||||
pcre2(3).
|
||||
|
||||
|
||||
@ -8962,7 +8993,7 @@ AUTHOR
|
||||
|
||||
REVISION
|
||||
|
||||
Last updated: 10 July 2018
|
||||
Last updated: 11 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
|
@ -1,4 +1,4 @@
|
||||
.TH PCRE2PATTERN 3 "10 July 2018" "PCRE2 10.32"
|
||||
.TH PCRE2PATTERN 3 "11 July 2018" "PCRE2 10.32"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.SH "PCRE2 REGULAR EXPRESSION DETAILS"
|
||||
@ -3262,13 +3262,13 @@ to ensure that the match is always attempted.
|
||||
.rs
|
||||
.sp
|
||||
The following verbs do nothing when they are encountered. Matching continues
|
||||
with what follows, but if there is no subsequent match, causing a backtrack to
|
||||
the verb, a failure is forced. That is, backtracking cannot pass to the left of
|
||||
the verb. However, when one of these verbs appears inside an atomic group or in
|
||||
an assertion that is true, its effect is confined to that group, because once
|
||||
the group has been matched, there is never any backtracking into it. In this
|
||||
situation, backtracking has to jump to the left of the entire atomic group or
|
||||
assertion.
|
||||
with what follows, but if there is a subsequent match failure, causing a
|
||||
backtrack to the verb, a failure is forced. That is, backtracking cannot pass
|
||||
to the left of the verb. However, when one of these verbs appears inside an
|
||||
atomic group or in a lookaround assertion that is true, its effect is confined
|
||||
to that group, because once the group has been matched, there is never any
|
||||
backtracking into it. Backtracking from beyond an assertion or an atomic group
|
||||
ignores the entire group, and seeks a preceeding backtracking point.
|
||||
.P
|
||||
These verbs differ in exactly what kind of failure occurs when backtracking
|
||||
reaches them. The behaviour described below is what happens when the verb is
|
||||
@ -3352,12 +3352,36 @@ instead of skipping on to "c".
|
||||
.sp
|
||||
(*SKIP:NAME)
|
||||
.sp
|
||||
When (*SKIP) has an associated name, its behaviour is modified. When it is
|
||||
triggered, the previous path through the pattern is searched for the most
|
||||
recent (*MARK) that has the same name. If one is found, the "bumpalong" advance
|
||||
is to the subject position that corresponds to that (*MARK) instead of to where
|
||||
(*SKIP) was encountered. If no (*MARK) with a matching name is found, the
|
||||
(*SKIP) is ignored.
|
||||
When (*SKIP) has an associated name, its behaviour is modified. When such a
|
||||
(*SKIP) is triggered, the previous path through the pattern is searched for the
|
||||
most recent (*MARK) that has the same name. If one is found, the "bumpalong"
|
||||
advance is to the subject position that corresponds to that (*MARK) instead of
|
||||
to where (*SKIP) was encountered. If no (*MARK) with a matching name is found,
|
||||
the (*SKIP) is ignored.
|
||||
.P
|
||||
The search for a (*MARK) name uses the normal backtracking mechanism, which
|
||||
means that it does not see (*MARK) settings that are inside atomic groups or
|
||||
assertions, because they are never re-entered by backtracking. Compare the
|
||||
following \fBpcre2test\fP examples:
|
||||
.sp
|
||||
re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
|
||||
data: abc
|
||||
0: a
|
||||
1: a
|
||||
data:
|
||||
re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
|
||||
data: abc
|
||||
0: b
|
||||
1: b
|
||||
.sp
|
||||
In the first example, the (*MARK) setting is in an atomic group, so it is not
|
||||
seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows
|
||||
the second branch of the pattern to be tried at the first character position.
|
||||
In the second example, the (*MARK) setting is not in an atomic group. This
|
||||
allows (*SKIP:X) to immediately cause a new matching attempt to start at the
|
||||
second character. This time, the (*MARK) is never seen because "a" does not
|
||||
match "b", so the matcher immediately jumps to the second branch of the
|
||||
pattern.
|
||||
.P
|
||||
Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores
|
||||
names that are set by (*PRUNE:NAME) or (*THEN:NAME).
|
||||
@ -3481,16 +3505,23 @@ If the assertion is a condition, (*ACCEPT) causes the condition to be true for
|
||||
a positive assertion and false for a negative one; captured substrings are
|
||||
retained in both cases.
|
||||
.P
|
||||
The remaining verbs act only when a later failure causes a backtrack to
|
||||
reach them. This means that their effect is confined to the assertion,
|
||||
because lookaround assertions are atomic. A backtrack that occurs after an
|
||||
assertion is complete does not jump back into the assertion. Note in particular
|
||||
that a (*MARK) name that is set in an assertion is not "seen" by an instance of
|
||||
(*SKIP:NAME) latter in the pattern.
|
||||
.P
|
||||
The effect of (*THEN) is not allowed to escape beyond an assertion. If there
|
||||
are no more branches to try, (*THEN) causes a positive assertion to be false,
|
||||
and a negative assertion to be true.
|
||||
.P
|
||||
The other backtracking verbs are not treated specially if they appear in a
|
||||
standalone positive assertion. In a conditional positive assertion,
|
||||
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the condition to be
|
||||
false. However, for both standalone and conditional negative assertions,
|
||||
backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes the assertion to be
|
||||
true, without considering any further alternative branches.
|
||||
backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE)
|
||||
causes the condition to be false. However, for both standalone and conditional
|
||||
negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes
|
||||
the assertion to be true, without considering any further alternative branches.
|
||||
.
|
||||
.
|
||||
.\" HTML <a name="btsub"></a>
|
||||
@ -3536,6 +3567,6 @@ Cambridge, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 10 July 2018
|
||||
Last updated: 11 July 2018
|
||||
Copyright (c) 1997-2018 University of Cambridge.
|
||||
.fi
|
||||
|
10
perltest.sh
10
perltest.sh
@ -43,7 +43,7 @@ fi
|
||||
# afteralltext ignored
|
||||
# dupnames ignored (Perl always allows)
|
||||
# jitstack ignored
|
||||
# mark ignored
|
||||
# mark show mark information
|
||||
# no_auto_possess ignored
|
||||
# no_start_optimize ignored
|
||||
# subject_literal does not process subjects for escapes
|
||||
@ -172,9 +172,9 @@ for (;;)
|
||||
|
||||
$mod =~ s/jitstack=\d+,?//;
|
||||
|
||||
# Remove "mark" (asks pcre2test to check MARK data) */
|
||||
# The "mark" modifier requests checking of MARK data */
|
||||
|
||||
$mod =~ s/mark,?//;
|
||||
$show_mark = ($mod =~ s/mark,?//);
|
||||
|
||||
# "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl
|
||||
|
||||
@ -279,7 +279,7 @@ for (;;)
|
||||
elsif (scalar(@subs) == 0)
|
||||
{
|
||||
printf $outfile "No match";
|
||||
if (defined $REGERROR && $REGERROR != 1)
|
||||
if ($show_mark && defined $REGERROR && $REGERROR != 1)
|
||||
{ printf $outfile (", mark = %s", &pchars($REGERROR)); }
|
||||
printf $outfile "\n";
|
||||
}
|
||||
@ -307,7 +307,7 @@ for (;;)
|
||||
# set and the input pattern was a UTF-8 string. We can, however, force
|
||||
# it to be so marked.
|
||||
|
||||
if (defined $REGMARK && $REGMARK != 1)
|
||||
if ($show_mark && defined $REGMARK && $REGMARK != 1)
|
||||
{
|
||||
$xx = $REGMARK;
|
||||
$xx = Encode::decode_utf8($xx) if $utf8;
|
||||
|
9
testdata/testinput1
vendored
9
testdata/testinput1
vendored
@ -6202,4 +6202,13 @@ ef) x/x,mark
|
||||
|
||||
/(?<=(?=.){4,5}x)/
|
||||
|
||||
/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
|
||||
abc
|
||||
|
||||
/a(?>(*:X))(*SKIP:X)(*F)|(.)/
|
||||
abc
|
||||
|
||||
/a(?:(*:X))(*SKIP:X)(*F)|(.)/
|
||||
abc
|
||||
|
||||
# End of testinput1
|
||||
|
15
testdata/testoutput1
vendored
15
testdata/testoutput1
vendored
@ -9841,4 +9841,19 @@ No match
|
||||
|
||||
/(?<=(?=.){4,5}x)/
|
||||
|
||||
/a(?=.(*:X))(*SKIP:X)(*F)|(.)/
|
||||
abc
|
||||
0: a
|
||||
1: a
|
||||
|
||||
/a(?>(*:X))(*SKIP:X)(*F)|(.)/
|
||||
abc
|
||||
0: a
|
||||
1: a
|
||||
|
||||
/a(?:(*:X))(*SKIP:X)(*F)|(.)/
|
||||
abc
|
||||
0: b
|
||||
1: b
|
||||
|
||||
# End of testinput1
|
||||
|
Loading…
Reference in New Issue
Block a user