Fix problems with new PCRE2_SUBSTITUTE_MATCHED code.
This commit is contained in:
parent
126137b96c
commit
690d2e3ff3
@ -1,4 +1,4 @@
|
||||
.TH PCRE2API 3 "22 January 2020" "PCRE2 10.35"
|
||||
.TH PCRE2API 3 "16 February 2020" "PCRE2 10.35"
|
||||
.SH NAME
|
||||
PCRE2 - Perl-compatible regular expressions (revised API)
|
||||
.sp
|
||||
@ -3328,12 +3328,12 @@ can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
|
||||
option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the
|
||||
replacement string(s). The default action is to perform just one replacement if
|
||||
the pattern matches, but there is an option that requests multiple replacements
|
||||
(see PCRE2_SUBSTITUTE_GLOBAL below for details).
|
||||
(see PCRE2_SUBSTITUTE_GLOBAL below).
|
||||
.P
|
||||
If successful, \fBpcre2_substitute()\fP returns the number of substitutions
|
||||
that were carried out. This may be zero if no match was found, and is never
|
||||
greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
|
||||
returned if an error is detected (see below for details).
|
||||
returned if an error is detected.
|
||||
.P
|
||||
Matches in which a \eK item in a lookahead in the pattern causes the match to
|
||||
end before it starts are not supported, and give rise to an error return. For
|
||||
@ -3348,10 +3348,11 @@ data block is obtained and freed within this function, using memory management
|
||||
functions from the match context, if provided, or else those that were used to
|
||||
allocate memory for the compiled code.
|
||||
.P
|
||||
If an external \fImatch_data\fP block is provided, its contents afterwards
|
||||
are those set by the final call to \fBpcre2_match()\fP. For global changes,
|
||||
this will have ended in a no-match error. The contents of the ovector within
|
||||
the match data block may or may not have been changed.
|
||||
If \fImatch_data\fP is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
|
||||
provided block is used for all calls to \fBpcre2_match()\fP, and its contents
|
||||
afterwards are the result of the final call. For global changes, this will
|
||||
always be a no-match error. The contents of the ovector within the match data
|
||||
block may or may not have been changed.
|
||||
.P
|
||||
As well as the usual options for \fBpcre2_match()\fP, a number of additional
|
||||
options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP.
|
||||
@ -3363,16 +3364,22 @@ calling \fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows
|
||||
an application to check for a match before choosing to substitute, without
|
||||
having to repeat the match.
|
||||
.P
|
||||
The \fIcode\fP argument is not used for the first substitution when
|
||||
PCRE2_SUBSTITUTE_MATCHED is set, but if PCRE2_SUBSTITUTE_GLOBAL is also set,
|
||||
\fBpcre2_match()\fP will be called after the first substitution to check for
|
||||
further matches, and the contents of the \fImatch_data\fP block will be
|
||||
changed.
|
||||
The contents of the externally supplied match data block are not changed when
|
||||
PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
|
||||
\fBpcre2_match()\fP is called after the first substitution to check for further
|
||||
matches, but this is done using an internally obtained match data block, thus
|
||||
always leaving the external block unchanged.
|
||||
.P
|
||||
The default is to return a copy of the subject string with matched substrings
|
||||
replaced. However, if PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the
|
||||
replacement substrings are returned. In the global case, multiple replacements
|
||||
are concatenated in the output buffer. Substitution callouts (see
|
||||
The \fIcode\fP argument is not used for matching before the first substitution
|
||||
when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when
|
||||
PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the
|
||||
UTF setting and the number of capturing parentheses in the pattern.
|
||||
.P
|
||||
The default action of \fBpcre2_substitute()\fP is to return a copy of the
|
||||
subject string with matched substrings replaced. However, if
|
||||
PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
|
||||
returned. In the global case, multiple replacements are concatenated in the
|
||||
output buffer. Substitution callouts (see
|
||||
.\" HTML <a href="#subcallouts">
|
||||
.\" </a>
|
||||
below)
|
||||
@ -3381,26 +3388,39 @@ can be used to separate them if necessary.
|
||||
.P
|
||||
The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a
|
||||
variable that contains the length, in code units, of the output buffer. If the
|
||||
function is successful, the value is updated to contain the length of the new
|
||||
string, excluding the trailing zero that is automatically added.
|
||||
function is successful, the value is updated to contain the length in code
|
||||
units of the new string, excluding the trailing zero that is automatically
|
||||
added.
|
||||
.P
|
||||
If the function is not successful, the value set via \fIoutlengthptr\fP depends
|
||||
on the type of error. For syntax errors in the replacement string, the value is
|
||||
the offset in the replacement string where the error was detected. For other
|
||||
errors, the value is PCRE2_UNSET by default. This includes the case of the
|
||||
output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set
|
||||
(see below), in which case the value is the minimum length needed, including
|
||||
space for the trailing zero. Note that in order to compute the required length,
|
||||
\fBpcre2_substitute()\fP has to simulate all the matching and copying, instead
|
||||
of giving an error return as soon as the buffer overflows. Note also that the
|
||||
length is in code units, not bytes.
|
||||
output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
|
||||
.P
|
||||
The replacement string, which is interpreted as a UTF string in UTF mode,
|
||||
is checked for UTF validity unless the PCRE2_NO_UTF_CHECK option is set. If the
|
||||
PCRE2_SUBSTITUTE_LITERAL option is set, it is not interpreted in any way. By
|
||||
default, however, a dollar character is an escape character that can specify
|
||||
the insertion of characters from capture groups and names from (*MARK) or other
|
||||
control verbs in the pattern. The following forms are always recognized:
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
||||
too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
|
||||
this option is set, however, \fBpcre2_substitute()\fP continues to go through
|
||||
the motions of matching and substituting (without, of course, writing anything)
|
||||
in order to compute the size of buffer that is needed. This value is passed
|
||||
back via the \fIoutlengthptr\fP variable, with the result of the function still
|
||||
being PCRE2_ERROR_NOMEMORY.
|
||||
.P
|
||||
Passing a buffer size of zero is a permitted way of finding out how much memory
|
||||
is needed for given substitution. However, this does mean that the entire
|
||||
operation is carried out twice. Depending on the application, it may be more
|
||||
efficient to allocate a large buffer and free the excess afterwards, instead of
|
||||
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
|
||||
.P
|
||||
The replacement string, which is interpreted as a UTF string in UTF mode, is
|
||||
checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF
|
||||
replacement string causes an immediate return with the relevant UTF error code.
|
||||
.P
|
||||
If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted
|
||||
in any way. By default, however, a dollar character is an escape character that
|
||||
can specify the insertion of characters from capture groups and names from
|
||||
(*MARK) or other control verbs in the pattern. The following forms are always
|
||||
recognized:
|
||||
.sp
|
||||
$$ insert a dollar character
|
||||
$<n> or ${<n>} insert the contents of group <n>
|
||||
@ -3445,20 +3465,6 @@ If this is not successful, the offset is advanced by one character except when
|
||||
CRLF is a valid newline sequence and the next two characters are CR, LF. In
|
||||
this case, the offset is advanced by two characters.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
|
||||
too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
|
||||
this option is set, however, \fBpcre2_substitute()\fP continues to go through
|
||||
the motions of matching and substituting (without, of course, writing anything)
|
||||
in order to compute the size of buffer that is needed. This value is passed
|
||||
back via the \fIoutlengthptr\fP variable, with the result of the function still
|
||||
being PCRE2_ERROR_NOMEMORY.
|
||||
.P
|
||||
Passing a buffer size of zero is a permitted way of finding out how much memory
|
||||
is needed for given substitution. However, this does mean that the entire
|
||||
operation is carried out twice. Depending on the application, it may be more
|
||||
efficient to allocate a large buffer and free the excess afterwards, instead of
|
||||
using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH.
|
||||
.P
|
||||
PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do
|
||||
not appear in the pattern to be treated as unset groups. This option should be
|
||||
used with care, because it means that a typo in a group name or number no
|
||||
@ -3917,6 +3923,6 @@ Cambridge, England.
|
||||
.rs
|
||||
.sp
|
||||
.nf
|
||||
Last updated: 22 January 2020
|
||||
Last updated: 16 February 2020
|
||||
Copyright (c) 1997-2020 University of Cambridge.
|
||||
.fi
|
||||
|
@ -229,7 +229,7 @@ int forcecasereset = 0;
|
||||
uint32_t ovector_count;
|
||||
uint32_t goptions = 0;
|
||||
uint32_t suboptions;
|
||||
BOOL match_data_created = FALSE;
|
||||
pcre2_match_data *internal_match_data = NULL;
|
||||
BOOL escaped_literal = FALSE;
|
||||
BOOL overflowed = FALSE;
|
||||
BOOL use_existing_match;
|
||||
@ -265,22 +265,42 @@ pointer in the match data may be NULL after a no-match. */
|
||||
use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0);
|
||||
replacement_only = ((options & PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) != 0);
|
||||
|
||||
if (use_existing_match)
|
||||
/* If starting from an existing match, there must be an externally provided
|
||||
match data block. We create an internal match_data block in two cases: (a) an
|
||||
external one is not supplied (and we are not starting from an existing match);
|
||||
(b) an existing match is to be used for the first substitution. In the latter
|
||||
case, we copy the existing match into the internal block. This ensures that no
|
||||
changes are made to the existing match data block. */
|
||||
|
||||
if (match_data == NULL)
|
||||
{
|
||||
if (match_data == NULL) return PCRE2_ERROR_NULL;
|
||||
pcre2_general_context *gcontext;
|
||||
if (use_existing_match) return PCRE2_ERROR_NULL;
|
||||
gcontext = (mcontext == NULL)?
|
||||
(pcre2_general_context *)code :
|
||||
(pcre2_general_context *)mcontext;
|
||||
match_data = internal_match_data =
|
||||
pcre2_match_data_create_from_pattern(code, gcontext);
|
||||
if (internal_match_data == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||
}
|
||||
|
||||
/* Otherwise, if no match data block is provided, create one. */
|
||||
|
||||
else if (match_data == NULL)
|
||||
else if (use_existing_match)
|
||||
{
|
||||
pcre2_general_context *gcontext = (mcontext == NULL)?
|
||||
(pcre2_general_context *)code :
|
||||
(pcre2_general_context *)mcontext;
|
||||
match_data = pcre2_match_data_create_from_pattern(code, gcontext);
|
||||
if (match_data == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||
match_data_created = TRUE;
|
||||
int pairs = (code->top_bracket + 1 < match_data->oveccount)?
|
||||
code->top_bracket + 1 : match_data->oveccount;
|
||||
internal_match_data = pcre2_match_data_create(match_data->oveccount,
|
||||
gcontext);
|
||||
if (internal_match_data == NULL) return PCRE2_ERROR_NOMEMORY;
|
||||
memcpy(internal_match_data, match_data, offsetof(pcre2_match_data, ovector)
|
||||
+ 2*pairs*sizeof(PCRE2_SIZE));
|
||||
match_data = internal_match_data;
|
||||
}
|
||||
|
||||
/* Remember ovector details */
|
||||
|
||||
ovector = pcre2_get_ovector_pointer(match_data);
|
||||
ovector_count = pcre2_get_ovector_count(match_data);
|
||||
|
||||
@ -302,7 +322,7 @@ repend = replacement + rlength;
|
||||
#ifdef SUPPORT_UNICODE
|
||||
if (utf && (options & PCRE2_NO_UTF_CHECK) == 0)
|
||||
{
|
||||
rc = PRIV(valid_utf)(replacement, rlength, &(match_data->rightchar));
|
||||
rc = PRIV(valid_utf)(replacement, rlength, &(match_data->startchar));
|
||||
if (rc != 0)
|
||||
{
|
||||
match_data->leftchar = 0;
|
||||
@ -316,7 +336,7 @@ if (utf && (options & PCRE2_NO_UTF_CHECK) == 0)
|
||||
suboptions = options & SUBSTITUTE_OPTIONS;
|
||||
options &= ~SUBSTITUTE_OPTIONS;
|
||||
|
||||
/* Error if the start match offset it greater than the length of the subject. */
|
||||
/* Error if the start match offset is greater than the length of the subject. */
|
||||
|
||||
if (start_offset > length)
|
||||
{
|
||||
@ -344,7 +364,7 @@ do
|
||||
use_existing_match = FALSE;
|
||||
}
|
||||
else rc = pcre2_match(code, subject, length, start_offset, options|goptions,
|
||||
match_data, mcontext);
|
||||
match_data, mcontext);
|
||||
|
||||
#ifdef SUPPORT_UNICODE
|
||||
if (utf) options |= PCRE2_NO_UTF_CHECK; /* Only need to check once */
|
||||
@ -898,8 +918,9 @@ do
|
||||
}
|
||||
|
||||
/* Save the details of this match. See above for how this data is used. If we
|
||||
matched an empty string, do the magic for global matches. Finally, update the
|
||||
start offset to point to the rest of the subject string. */
|
||||
matched an empty string, do the magic for global matches. Update the start
|
||||
offset to point to the rest of the subject string. If we re-used an existing
|
||||
match for the first match, switch to the internal match data block. */
|
||||
|
||||
ovecsave[0] = ovector[0];
|
||||
ovecsave[1] = ovector[1];
|
||||
@ -942,7 +963,7 @@ else
|
||||
}
|
||||
|
||||
EXIT:
|
||||
if (match_data_created) pcre2_match_data_free(match_data);
|
||||
if (internal_match_data != NULL) pcre2_match_data_free(internal_match_data);
|
||||
else match_data->rc = rc;
|
||||
return rc;
|
||||
|
||||
|
27
testdata/testinput2
vendored
27
testdata/testinput2
vendored
@ -5806,6 +5806,33 @@ a)"xI
|
||||
12abc34xyz99abc55\=substitute_skip=1
|
||||
12abc34xyz99abc55\=substitute_skip=2
|
||||
|
||||
/a(..)d/replace=>$1<,substitute_matched
|
||||
xyzabcdxyzabcdxyz
|
||||
xyzabcdxyzabcdxyz\=ovector=2
|
||||
\= Expect error
|
||||
xyzabcdxyzabcdxyz\=ovector=1
|
||||
|
||||
/a(..)d/g,replace=>$1<,substitute_matched
|
||||
xyzabcdxyzabcdxyz
|
||||
xyzabcdxyzabcdxyz\=ovector=2
|
||||
\= Expect error
|
||||
xyzabcdxyzabcdxyz\=ovector=1
|
||||
xyzabcdxyzabcdxyz\=ovector=1,substitute_unset_empty
|
||||
|
||||
/55|a(..)d/g,replace=>$1<,substitute_matched
|
||||
xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty
|
||||
\= Expect error
|
||||
xyz55abcdxyzabcdxyz\=ovector=2
|
||||
|
||||
/55|a(..)d/replace=>$1<,substitute_matched
|
||||
xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty
|
||||
|
||||
/55|a(..)d/replace=>$1<
|
||||
xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty
|
||||
|
||||
/55|a(..)d/g,replace=>$1<
|
||||
xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty
|
||||
|
||||
# Expect non-fixed-length error
|
||||
|
||||
"(?<=X(?(DEFINE)(.*))(?1))."
|
||||
|
39
testdata/testoutput2
vendored
39
testdata/testoutput2
vendored
@ -17536,6 +17536,45 @@ Callout 0: last capture = 2
|
||||
3(2) Old 12 15 "abc" New 5 10 "<abc>"
|
||||
3: <abc><abc>
|
||||
|
||||
/a(..)d/replace=>$1<,substitute_matched
|
||||
xyzabcdxyzabcdxyz
|
||||
1: xyz>bc<xyzabcdxyz
|
||||
xyzabcdxyzabcdxyz\=ovector=2
|
||||
1: xyz>bc<xyzabcdxyz
|
||||
\= Expect error
|
||||
xyzabcdxyzabcdxyz\=ovector=1
|
||||
Failed: error -54 at offset 3 in replacement: requested value is not available
|
||||
|
||||
/a(..)d/g,replace=>$1<,substitute_matched
|
||||
xyzabcdxyzabcdxyz
|
||||
2: xyz>bc<xyz>bc<xyz
|
||||
xyzabcdxyzabcdxyz\=ovector=2
|
||||
2: xyz>bc<xyz>bc<xyz
|
||||
\= Expect error
|
||||
xyzabcdxyzabcdxyz\=ovector=1
|
||||
Failed: error -54 at offset 3 in replacement: requested value is not available
|
||||
xyzabcdxyzabcdxyz\=ovector=1,substitute_unset_empty
|
||||
Failed: error -54 at offset 3 in replacement: requested value is not available
|
||||
|
||||
/55|a(..)d/g,replace=>$1<,substitute_matched
|
||||
xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty
|
||||
3: xyz><>bc<xyz>bc<xyz
|
||||
\= Expect error
|
||||
xyz55abcdxyzabcdxyz\=ovector=2
|
||||
Failed: error -55 at offset 3 in replacement: requested value is not set
|
||||
|
||||
/55|a(..)d/replace=>$1<,substitute_matched
|
||||
xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty
|
||||
1: xyz><abcdxyzabcdxyz
|
||||
|
||||
/55|a(..)d/replace=>$1<
|
||||
xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty
|
||||
1: xyz><abcdxyzabcdxyz
|
||||
|
||||
/55|a(..)d/g,replace=>$1<
|
||||
xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty
|
||||
3: xyz><>bc<xyz>bc<xyz
|
||||
|
||||
# Expect non-fixed-length error
|
||||
|
||||
"(?<=X(?(DEFINE)(.*))(?1))."
|
||||
|
Loading…
Reference in New Issue
Block a user