From c6d93dd7eaa7cf0f5cd08961db1f7d5e1c30f602 Mon Sep 17 00:00:00 2001 From: Francesco Montorsi Date: Mon, 19 Jan 2009 00:21:31 +0000 Subject: [PATCH] fix some wording and a typo git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@58217 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775 --- docs/doxygen/overviews/string.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/doxygen/overviews/string.h b/docs/doxygen/overviews/string.h index 3829548e3c..f74c6e3a2e 100644 --- a/docs/doxygen/overviews/string.h +++ b/docs/doxygen/overviews/string.h @@ -56,7 +56,7 @@ see the @ref overview_unicode_encodings paragraph. For simplicity of implementation, wxString when wxUSE_UNICODE_WCHAR==1 (e.g. on Windows) uses per code unit indexing instead of per code point indexing and doesn't know anything about surrogate pairs; -in other words it always considers code points to be composed by 1 code point, +in other words it always considers code points to be composed by 1 code unit, while this is really true only for characters in the @e BMP (Basic Multilingual Plane). Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user code has to take care of surrogate pairs himself. @@ -66,7 +66,9 @@ such as for drawing strings on screen.) @remarks Note that while the behaviour of wxString when wxUSE_UNICODE_WCHAR==1 resembles UCS-2 encoding, it's not completely correct to refer to wxString as -UCS-2 encoded since you can encode characters outside the @e BMP in a wxString. +UCS-2 encoded since you can encode code points outside the @e BMP in a wxString +as two code units (i.e. as a surrogate pair; as already mentioned however wxString +will "see" them as two different code points) When instead wxUSE_UNICODE_UTF8==1 (e.g. on Linux and Mac OS X) wxString handles UTF8 multi-bytes sequences just fine also for characters outside