pcre/ChangeLog

Change Log for PCRE2
--------------------

Version 10.0 xx-xxxx-2014
-------------------------

Version 10.0 is the first release of PCRE2, a revised API for the PCRE library.
Changes prior to 10.0 are logged in the ChangeLog file for the old API, up to 
item 20 for release 8.36.

The code of the library was heavily revised as part of the new API 
implementation. Details of each and every modification were not individually 
logged. In addition to the API changes, the following changes were made. They 
are either new functionality, or bugs that were fixed after the code had been 
forked.

1. The test program, now called pcre2test, was re-specified and almost 
completely re-written. Its input is not compatible with input for pcretest.

2. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the
PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is 
matched by that pattern.

3. For the benefit of those who use PCRE2 via some other application, that is, 
not writing the function calls themselves, it is possible to check the PCRE2 
version by matching a pattern such as /(?(VERSION>=10.0)yes|no)/ against a 
string such as "yesno".

4. There are case-equivalent Unicode characters whose encodings use different 
numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is 
theoretically possible for this to happen in UTF-16 too.) If a backreference to 
a group containing one of these characters was greedily repeated, and during 
the match a backtrack occurred, the subject might be backtracked by the wrong
number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly 
(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should 
capture the final character, which is the three bytes E2, B1, and A5 in UTF-8.
Incorrect backtracking meant that group 2 captured only the last two bytes. 
This bug has been fixed; the new code is slower, but it is used only when the 
strings matched by the repetition are not all the same length.

****
Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00			`Change Log for PCRE2`
			`--------------------`

			`Version 10.0 xx-xxxx-2014`
			`-------------------------`

			`Version 10.0 is the first release of PCRE2, a revised API for the PCRE library.`
Refactor match_ref() and fix UTF-8 caseless bug. 2014-08-27 12:59:56 -04:00			`Changes prior to 10.0 are logged in the ChangeLog file for the old API, up to`
			`item 20 for release 8.36.`
Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00
			`The code of the library was heavily revised as part of the new API`
			`implementation. Details of each and every modification were not individually`
Refactor match_ref() and fix UTF-8 caseless bug. 2014-08-27 12:59:56 -04:00			`logged. In addition to the API changes, the following changes were made. They`
			`are either new functionality, or bugs that were fixed after the code had been`
			`forked.`
Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00
			`1. The test program, now called pcre2test, was re-specified and almost`
			`completely re-written. Its input is not compatible with input for pcretest.`

			`2. Patterns may start with (NOTEMPTY) or (NOTEMPTY_ATSTART) to set the`
			`PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is`
			`matched by that pattern.`

			`3. For the benefit of those who use PCRE2 via some other application, that is,`
			`not writing the function calls themselves, it is possible to check the PCRE2`
			`version by matching a pattern such as /(?(VERSION>=10.0)yes\|no)/ against a`
			`string such as "yesno".`

Refactor match_ref() and fix UTF-8 caseless bug. 2014-08-27 12:59:56 -04:00			`4. There are case-equivalent Unicode characters whose encodings use different`
			`numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is`
			`theoretically possible for this to happen in UTF-16 too.) If a backreference to`
			`a group containing one of these characters was greedily repeated, and during`
			`the match a backtrack occurred, the subject might be backtracked by the wrong`
			`number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly`
			`(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should`
			`capture the final character, which is the three bytes E2, B1, and A5 in UTF-8.`
			`Incorrect backtracking meant that group 2 captured only the last two bytes.`
			`This bug has been fixed; the new code is slower, but it is used only when the`
			`strings matched by the repetition are not all the same length.`

Add non-API new features to ChangeLog. 2014-08-26 11:41:31 -04:00			`****`