5238cda077
Also, needed to understand Byzantine fault tolerant paxos better. Still do not.
37 lines
1.6 KiB
HTML
37 lines
1.6 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
|
<style>
|
|
body {
|
|
max-width: 30em;
|
|
margin-left: 2em;
|
|
}
|
|
p.center {
|
|
text-align:center;
|
|
}
|
|
</style>
|
|
<link rel="shortcut icon" href="../rho.ico">
|
|
<title>Normalizing unicode strings</title>
|
|
</head>
|
|
<body>
|
|
<p><a href="./index.html"> To Home page</a> </p>
|
|
<h1>Normalizing unicode strings</h1><p>
|
|
|
|
I would like strings that look similar to humans to map to the same item. Obviously trailing and leading whitespace needs to go, and whitespace map a single space.</p><p>
|
|
|
|
The hard part, however is that unicode has an enormous number of near duplicate symbols.</p><p>
|
|
|
|
Have you already read<br/>
|
|
<a href="https://www.unicode.org/reports/tr15/tr15-45.html">https://www.unicode.org/reports/tr15/tr15-45.html</a> ?</p><p>
|
|
|
|
Our normalization code is in<br/>
|
|
<a href="http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=tree;f=libraries/liblunicode;h=4896a6dc9ee5d3e78c15ed6c2e2ed2f21be70247;hb=HEAD">http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=tree;f=libraries/liblunicode;h=4896a6dc9ee5d3e78c15ed6c2e2ed2f21be70247;hb=HEAD</a></p><p>
|
|
|
|
I am going to have to use NFKC canonical form for the key, and NFC canonical form for the display of the key.</p><p>
|
|
|
|
Which once in a blue moon will drive someone crazy. "Its broken" he will say </p>
|
|
<p style="background-color : #ccffcc; font-size:80%">These documents are licensed under the <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-Share Alike 3.0 License</a></p>
|
|
</body>
|
|
</html>
|