2014-08-26 11:41:31 -04:00
|
|
|
Change Log for PCRE2
|
|
|
|
--------------------
|
|
|
|
|
2015-01-23 11:51:47 -05:00
|
|
|
Version 10.10 xx-xxx-2015
|
|
|
|
-------------------------
|
2015-01-13 11:01:24 -05:00
|
|
|
|
2015-01-23 11:51:47 -05:00
|
|
|
1. When a pattern is compiled, it remembers the highest back reference so that
|
|
|
|
when matching, if the ovector is too small, extra memory can be obtained to
|
2015-01-13 11:01:24 -05:00
|
|
|
use instead. A conditional subpattern whose condition is a check on a capture
|
|
|
|
having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is
|
|
|
|
another kind of back reference, but it was not setting the highest
|
|
|
|
backreference number. This mattered only if pcre2_match() was called with an
|
|
|
|
ovector that was too small to hold the capture, and there was no other kind of
|
|
|
|
back reference (a situation which is probably quite rare). The effect of the
|
|
|
|
bug was that the condition was always treated as FALSE when the capture could
|
|
|
|
not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
|
|
|
|
has been fixed.
|
|
|
|
|
2015-01-23 11:51:47 -05:00
|
|
|
2. Functions for serialization and deserialization of sets of compiled patterns
|
|
|
|
have been added.
|
|
|
|
|
|
|
|
3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove
|
|
|
|
excess code units at the end of the data block that may occasionally occur if
|
|
|
|
the code for calculating the size over-estimates. This change stops the
|
|
|
|
serialization code copying uninitialized data, to which valgrind objects. The
|
|
|
|
documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not
|
|
|
|
include the general overhead. This has been corrected.
|
|
|
|
|
|
|
|
4. All code units in every slot in the table of group names are now set, again
|
|
|
|
in order to avoid accessing uninitialized data when serializing.
|
|
|
|
|
2015-01-26 09:57:27 -05:00
|
|
|
5. The (*NO_JIT) feature is implemented.
|
|
|
|
|
2015-01-27 12:21:32 -05:00
|
|
|
6. If a bug that caused pcre2_compile() to use more memory than allocated was
|
|
|
|
triggered when using valgrind, the code in (3) above passed a stupidly large
|
|
|
|
value to valgrind. This caused a crash instead of an "internal error" return.
|
|
|
|
|
|
|
|
7. A reference to a duplicated named group (either a back reference or a test
|
|
|
|
for being set in a conditional) that occurred in a part of the pattern where
|
|
|
|
PCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern
|
|
|
|
to be incorrectly calculated, leading to overwriting.
|
|
|
|
|
2015-01-13 11:01:24 -05:00
|
|
|
|
2015-01-23 11:51:47 -05:00
|
|
|
Version 10.00 05-January-2015
|
2015-01-05 11:02:11 -05:00
|
|
|
-----------------------------
|
2014-08-26 11:41:31 -04:00
|
|
|
|
2014-11-24 10:31:28 -05:00
|
|
|
Version 10.00 is the first release of PCRE2, a revised API for the PCRE
|
|
|
|
library. Changes prior to 10.00 are logged in the ChangeLog file for the old
|
|
|
|
API, up to item 20 for release 8.36.
|
2014-08-26 11:41:31 -04:00
|
|
|
|
2014-10-20 13:28:49 -04:00
|
|
|
The code of the library was heavily revised as part of the new API
|
|
|
|
implementation. Details of each and every modification were not individually
|
|
|
|
logged. In addition to the API changes, the following changes were made. They
|
|
|
|
are either new functionality, or bug fixes and other noticeable changes of
|
2014-08-30 12:21:17 -04:00
|
|
|
behaviour that were implemented after the code had been forked.
|
2014-08-26 11:41:31 -04:00
|
|
|
|
2015-01-23 11:51:47 -05:00
|
|
|
1. Including Unicode support at build time is now enabled by default, but it
|
|
|
|
can optionally be disabled. It is not enabled by default at run time (no
|
|
|
|
change).
|
2014-11-03 13:27:56 -05:00
|
|
|
|
|
|
|
2. The test program, now called pcre2test, was re-specified and almost
|
2014-08-26 11:41:31 -04:00
|
|
|
completely re-written. Its input is not compatible with input for pcretest.
|
|
|
|
|
2014-11-03 13:27:56 -05:00
|
|
|
3. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the
|
2014-10-20 13:28:49 -04:00
|
|
|
PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is
|
2014-08-26 11:41:31 -04:00
|
|
|
matched by that pattern.
|
|
|
|
|
2014-11-03 13:27:56 -05:00
|
|
|
4. For the benefit of those who use PCRE2 via some other application, that is,
|
2014-10-20 13:28:49 -04:00
|
|
|
not writing the function calls themselves, it is possible to check the PCRE2
|
2014-11-24 10:31:28 -05:00
|
|
|
version by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a
|
2014-08-26 11:41:31 -04:00
|
|
|
string such as "yesno".
|
|
|
|
|
2014-11-03 13:27:56 -05:00
|
|
|
5. There are case-equivalent Unicode characters whose encodings use different
|
2014-10-20 13:28:49 -04:00
|
|
|
numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is
|
|
|
|
theoretically possible for this to happen in UTF-16 too.) If a backreference to
|
|
|
|
a group containing one of these characters was greedily repeated, and during
|
2014-08-27 12:59:56 -04:00
|
|
|
the match a backtrack occurred, the subject might be backtracked by the wrong
|
2014-10-20 13:28:49 -04:00
|
|
|
number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly
|
|
|
|
(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should
|
2014-08-27 12:59:56 -04:00
|
|
|
capture the final character, which is the three bytes E2, B1, and A5 in UTF-8.
|
2014-10-20 13:28:49 -04:00
|
|
|
Incorrect backtracking meant that group 2 captured only the last two bytes.
|
|
|
|
This bug has been fixed; the new code is slower, but it is used only when the
|
2014-08-27 12:59:56 -04:00
|
|
|
strings matched by the repetition are not all the same length.
|
|
|
|
|
2014-11-03 13:27:56 -05:00
|
|
|
6. A pattern such as /()a/ was not setting the "first character must be 'a'"
|
2014-10-20 13:28:49 -04:00
|
|
|
information. This applied to any pattern with a group that matched no
|
2014-08-30 12:21:17 -04:00
|
|
|
characters, for example: /(?:(?=.)|(?<!x))a/.
|
|
|
|
|
2014-11-05 11:05:19 -05:00
|
|
|
7. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for
|
|
|
|
those parentheses to be closed with whatever has been captured so far. However,
|
2014-11-24 10:31:28 -05:00
|
|
|
it was failing to mark any other groups between the highest capture so far and
|
2014-11-05 11:05:19 -05:00
|
|
|
the currrent group as "unset". Thus, the ovector for those groups contained
|
|
|
|
whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when
|
|
|
|
matched against "abcd".
|
|
|
|
|
2014-11-14 13:41:20 -05:00
|
|
|
8. The pcre2_substitute() function has been implemented.
|
|
|
|
|
2014-11-24 10:31:28 -05:00
|
|
|
9. If an assertion used as a condition was quantified with a minimum of zero
|
|
|
|
(an odd thing to do, but it happened), SIGSEGV or other misbehaviour could
|
|
|
|
occur.
|
2014-11-19 06:17:20 -05:00
|
|
|
|
2015-01-02 12:09:16 -05:00
|
|
|
10. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented.
|
|
|
|
|
2014-08-26 11:41:31 -04:00
|
|
|
****
|