1
0
forked from cheng/wallet
wallet/docs/normalizing_unicode_strings.html
reaction.la 5238cda077
cleanup, and just do not like pdfs
Also, needed to understand Byzantine fault tolerant paxos better.

Still do not.
2022-02-20 18:26:44 +10:00

37 lines
1.6 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style>
body {
max-width: 30em;
margin-left: 2em;
}
p.center {
text-align:center;
}
</style>
<link rel="shortcut icon" href="../rho.ico">
<title>Normalizing unicode strings</title>
</head>
<body>
<p><a href="./index.html"> To Home page</a> </p>
<h1>Normalizing unicode strings</h1><p>
I would like strings that look similar to humans to map to the same item. Obviously trailing and leading whitespace needs to go, and whitespace map a single space.</p><p>
The hard part, however is that unicode has an enormous number of near duplicate symbols.</p><p>
Have you already read<br/>
<a href="https://www.unicode.org/reports/tr15/tr15-45.html">https://www.unicode.org/reports/tr15/tr15-45.html</a> ?</p><p>
Our normalization code is in<br/>
<a href="http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=tree;f=libraries/liblunicode;h=4896a6dc9ee5d3e78c15ed6c2e2ed2f21be70247;hb=HEAD">http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=tree;f=libraries/liblunicode;h=4896a6dc9ee5d3e78c15ed6c2e2ed2f21be70247;hb=HEAD</a></p><p>
I am going to have to use NFKC canonical form for the key, and NFC canonical form for the display of the key.</p><p>
Which once in a blue moon will drive someone crazy. "Its broken" he will say </p>
<p style="background-color : #ccffcc; font-size:80%">These documents are licensed under the <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-Share Alike 3.0 License</a></p>
</body>
</html>