pcre/ChangeLog

Change Log for PCRE2
--------------------

Version 10.0 xx-xxxx-2014
-------------------------

Version 10.0 is the first release of PCRE2, a revised API for the PCRE library.
Changes prior to 10.0 are logged in the ChangeLog file for the old API, up to
item 20 for release 8.36.

The code of the library was heavily revised as part of the new API
implementation. Details of each and every modification were not individually
logged. In addition to the API changes, the following changes were made. They
are either new functionality, or bug fixes and other noticeable changes of
behaviour that were implemented after the code had been forked.

1. Unicode support is now enabled by default.

2. The test program, now called pcre2test, was re-specified and almost
completely re-written. Its input is not compatible with input for pcretest.

3. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the
PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is
matched by that pattern.

4. For the benefit of those who use PCRE2 via some other application, that is,
not writing the function calls themselves, it is possible to check the PCRE2
version by matching a pattern such as /(?(VERSION>=10.0)yes|no)/ against a
string such as "yesno".

5. There are case-equivalent Unicode characters whose encodings use different
numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is
theoretically possible for this to happen in UTF-16 too.) If a backreference to
a group containing one of these characters was greedily repeated, and during
the match a backtrack occurred, the subject might be backtracked by the wrong
number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly
(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should
capture the final character, which is the three bytes E2, B1, and A5 in UTF-8.
Incorrect backtracking meant that group 2 captured only the last two bytes.
This bug has been fixed; the new code is slower, but it is used only when the
strings matched by the repetition are not all the same length.

6. A pattern such as /()a/ was not setting the "first character must be 'a'"
information. This applied to any pattern with a group that matched no
characters, for example: /(?:(?=.)|(?<!x))a/.

7. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for
those parentheses to be closed with whatever has been captured so far. However,
it was failing to mark any other groups between the hightest capture so far and
the currrent group as "unset". Thus, the ovector for those groups contained
whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when
matched against "abcd".

8. The pcre2_substitute() function has been implemented.

9. If an assertion condition was quantified with a minimum of zero (an odd
thing to do, but it happened), SIGSEGV or other misbehaviour could occur.

****
Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00			`Change Log for PCRE2`
			`--------------------`

			`Version 10.0 xx-xxxx-2014`
			`-------------------------`

			`Version 10.0 is the first release of PCRE2, a revised API for the PCRE library.`
Tidy a lot of files (remove trailing spaces) 2014-10-20 13:28:49 -04:00			`Changes prior to 10.0 are logged in the ChangeLog file for the old API, up to`
Refactor match_ref() and fix UTF-8 caseless bug. 2014-08-27 12:59:56 -04:00			`item 20 for release 8.36.`
Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00
Tidy a lot of files (remove trailing spaces) 2014-10-20 13:28:49 -04:00			`The code of the library was heavily revised as part of the new API`
			`implementation. Details of each and every modification were not individually`
			`logged. In addition to the API changes, the following changes were made. They`
			`are either new functionality, or bug fixes and other noticeable changes of`
Make /()a/ set the "first character must be" data. 2014-08-30 12:21:17 -04:00			`behaviour that were implemented after the code had been forked.`
Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00
Make --enable-unicode the default. 2014-11-03 13:27:56 -05:00			`1. Unicode support is now enabled by default.`

			`2. The test program, now called pcre2test, was re-specified and almost`
Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00			`completely re-written. Its input is not compatible with input for pcretest.`

Make --enable-unicode the default. 2014-11-03 13:27:56 -05:00			`3. Patterns may start with (NOTEMPTY) or (NOTEMPTY_ATSTART) to set the`
Tidy a lot of files (remove trailing spaces) 2014-10-20 13:28:49 -04:00			`PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is`
Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00			`matched by that pattern.`

Make --enable-unicode the default. 2014-11-03 13:27:56 -05:00			`4. For the benefit of those who use PCRE2 via some other application, that is,`
Tidy a lot of files (remove trailing spaces) 2014-10-20 13:28:49 -04:00			`not writing the function calls themselves, it is possible to check the PCRE2`
			`version by matching a pattern such as /(?(VERSION>=10.0)yes\|no)/ against a`
Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00			`string such as "yesno".`

Make --enable-unicode the default. 2014-11-03 13:27:56 -05:00			`5. There are case-equivalent Unicode characters whose encodings use different`
Tidy a lot of files (remove trailing spaces) 2014-10-20 13:28:49 -04:00			`numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is`
			`theoretically possible for this to happen in UTF-16 too.) If a backreference to`
			`a group containing one of these characters was greedily repeated, and during`
Refactor match_ref() and fix UTF-8 caseless bug. 2014-08-27 12:59:56 -04:00			`the match a backtrack occurred, the subject might be backtracked by the wrong`
Tidy a lot of files (remove trailing spaces) 2014-10-20 13:28:49 -04:00			`number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly`
			`(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should`
Refactor match_ref() and fix UTF-8 caseless bug. 2014-08-27 12:59:56 -04:00			`capture the final character, which is the three bytes E2, B1, and A5 in UTF-8.`
Tidy a lot of files (remove trailing spaces) 2014-10-20 13:28:49 -04:00			`Incorrect backtracking meant that group 2 captured only the last two bytes.`
			`This bug has been fixed; the new code is slower, but it is used only when the`
Refactor match_ref() and fix UTF-8 caseless bug. 2014-08-27 12:59:56 -04:00			`strings matched by the repetition are not all the same length.`

Make --enable-unicode the default. 2014-11-03 13:27:56 -05:00			`6. A pattern such as /()a/ was not setting the "first character must be 'a'"`
Tidy a lot of files (remove trailing spaces) 2014-10-20 13:28:49 -04:00			`information. This applied to any pattern with a group that matched no`
Make /()a/ set the "first character must be" data. 2014-08-30 12:21:17 -04:00			`characters, for example: /(?:(?=.)\|(?<!x))a/.`

Fix bug for (*ACCEPT) inside a capturing group. 2014-11-05 11:05:19 -05:00			`7. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for`
			`those parentheses to be closed with whatever has been captured so far. However,`
			`it was failing to mark any other groups between the hightest capture so far and`
			`the currrent group as "unset". Thus, the ovector for those groups contained`
			`whatever was previously there. An example is the pattern /(x)\|((*ACCEPT))/ when`
			`matched against "abcd".`

Further substitution tests (code and data), and more documentation. 2014-11-14 13:41:20 -05:00			`8. The pcre2_substitute() function has been implemented.`

Fix zero-repeated assertion-as-condition bug. 2014-11-19 06:17:20 -05:00			`9. If an assertion condition was quantified with a minimum of zero (an odd`
			`thing to do, but it happened), SIGSEGV or other misbehaviour could occur.`

Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00			`****`