897 lines
32 KiB
HTML
897 lines
32 KiB
HTML
|
<HTML>
|
||
|
<HEAD>
|
||
|
<!-- This HTML file has been created by texi2html 1.54
|
||
|
from gettext.texi on 25 January 1999 -->
|
||
|
|
||
|
<TITLE>GNU gettext utilities - The Programmer's View</TITLE>
|
||
|
<link href="gettext_9.html" rel=Next>
|
||
|
<link href="gettext_7.html" rel=Previous>
|
||
|
<link href="gettext_toc.html" rel=ToC>
|
||
|
|
||
|
</HEAD>
|
||
|
<BODY>
|
||
|
<p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
|
||
|
<P><HR><P>
|
||
|
|
||
|
|
||
|
<H1><A NAME="SEC39" HREF="gettext_toc.html#TOC39">The Programmer's View</A></H1>
|
||
|
|
||
|
<P>
|
||
|
One aim of the current message catalog implementation provided by
|
||
|
GNU <CODE>gettext</CODE> was to use the systems message catalog handling, if the
|
||
|
installer wishes to do so. So we perhaps should first take a look at
|
||
|
the solutions we know about. The people in the POSIX committee does not
|
||
|
manage to agree on one of the semi-official standards which we'll
|
||
|
describe below. In fact they couldn't agree on anything, so nothing
|
||
|
decide only to include an example of an interface. The major Unix vendors
|
||
|
are split in the usage of the two most important specifications: X/Opens
|
||
|
catgets vs. Uniforums gettext interface. We'll describe them both and
|
||
|
later explain our solution of this dilemma.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC40" HREF="gettext_toc.html#TOC40">About <CODE>catgets</CODE></A></H2>
|
||
|
|
||
|
<P>
|
||
|
The <CODE>catgets</CODE> implementation is defined in the X/Open Portability
|
||
|
Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the
|
||
|
process of creating this standard seemed to be too slow for some of
|
||
|
the Unix vendors so they created their implementations on preliminary
|
||
|
versions of the standard. Of course this leads again to problems while
|
||
|
writing platform independent programs: even the usage of <CODE>catgets</CODE>
|
||
|
does not guarantee a unique interface.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Another, personal comment on this that only a bunch of committee members
|
||
|
could have made this interface. They never really tried to program
|
||
|
using this interface. It is a fast, memory-saving implementation, an
|
||
|
user can happily live with it. But programmers hate it (at least me and
|
||
|
some others do...)
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
But we must not forget one point: after all the trouble with transfering
|
||
|
the rights on Unix(tm) they at last came to X/Open, the very same who
|
||
|
published this specifications. This leads me to making the prediction
|
||
|
that this interface will be in future Unix standards (e.g. Spec1170) and
|
||
|
therefore part of all Unix implementation (implementations, which are
|
||
|
<EM>allowed</EM> to wear this name).
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC41" HREF="gettext_toc.html#TOC41">The Interface</A></H3>
|
||
|
|
||
|
<P>
|
||
|
The interface to the <CODE>catgets</CODE> implementation consists of three
|
||
|
functions which correspond to those used in file access: <CODE>catopen</CODE>
|
||
|
to open the catalog for using, <CODE>catgets</CODE> for accessing the message
|
||
|
tables, and <CODE>catclose</CODE> for closing after work is done. Prototypes
|
||
|
for the functions and the needed definitions are in the
|
||
|
<CODE><nl_types.h></CODE> header file.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
<CODE>catopen</CODE> is used like in this:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
nl_catd catd = catopen ("catalog_name", 0);
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
The function takes as the argument the name of the catalog. This usual
|
||
|
refers to the name of the program or the package. The second parameter
|
||
|
is not further specified in the standard. I don't even know whether it
|
||
|
is implemented consistently among various systems. So the common advice
|
||
|
is to use <CODE>0</CODE> as the value. The return value is a handle to the
|
||
|
message catalog, equivalent to handles to file returned by <CODE>open</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
This handle is of course used in the <CODE>catgets</CODE> function which can
|
||
|
be used like this:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
char *translation = catgets (catd, set_no, msg_id, "original string");
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
The first parameter is this catalog descriptor. The second parameter
|
||
|
specifies the set of messages in this catalog, in which the message
|
||
|
described by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a
|
||
|
three-stage addressing:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
catalog name => set number => message ID => translation
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
The fourth argument is not used to address the translation. It is given
|
||
|
as a default value in case when one of the addressing stages fail. One
|
||
|
important thing to remember is that although the return type of catgets
|
||
|
is <CODE>char *</CODE> the resulting string <EM>must not</EM> be changed. It
|
||
|
should better <CODE>const char *</CODE>, but the standard is published in
|
||
|
1988, one year before ANSI C.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The last of these function functions is used and behaves as expected:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
catclose (catd);
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
After this no <CODE>catgets</CODE> call using the descriptor is legal anymore.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC42" HREF="gettext_toc.html#TOC42">Problems with the <CODE>catgets</CODE> Interface?!</A></H3>
|
||
|
|
||
|
<P>
|
||
|
Now that this descriptions seemed to be really easy where are the
|
||
|
problem we speak of. In fact the interface could be used in a
|
||
|
reasonable way, but constructing the message catalogs is a pain. The
|
||
|
reason for this lies in the third argument of <CODE>catgets</CODE>: the unique
|
||
|
message ID. This has to be a numeric value for all messages in a single
|
||
|
set. Perhaps you could imagine the problems keeping such list while
|
||
|
changing the source code. Add a new message here, remove one there. Of
|
||
|
course there have been developed a lot of tools helping to organize this
|
||
|
chaos but one as the other fails in one aspect or the other. We don't
|
||
|
want to say that the other approach has no problems but they are far
|
||
|
more easily to manage.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC43" HREF="gettext_toc.html#TOC43">About <CODE>gettext</CODE></A></H2>
|
||
|
|
||
|
<P>
|
||
|
The definition of the <CODE>gettext</CODE> interface comes from a Uniforum
|
||
|
proposal and it is followed by at least one major Unix vendor
|
||
|
(Sun) in its last developments. It is not specified in any official
|
||
|
standard, though.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The main points about this solution is that it does not follow the
|
||
|
method of normal file handling (open-use-close) and that it does not
|
||
|
burden the programmer so many task, especially the unique key handling.
|
||
|
Of course here is also a unique key needed, but this key is the
|
||
|
message itself (how long or short it is). See section <A HREF="gettext_8.html#SEC48">Comparing the Two Interfaces</A> for a
|
||
|
more detailed comparison of the two methods.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The following section contains a rather detailed description of the
|
||
|
interface. We make it that detailed because this is the interface
|
||
|
we chose for the GNU <CODE>gettext</CODE> Library. Programmers interested
|
||
|
in using this library will be interested in this description.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC44" HREF="gettext_toc.html#TOC44">The Interface</A></H3>
|
||
|
|
||
|
<P>
|
||
|
The minimal functionality an interface must have is a) to select a
|
||
|
domain the strings are coming from (a single domain for all programs is
|
||
|
not reasonable because its construction and maintenance is difficult,
|
||
|
perhaps impossible) and b) to access a string in a selected domain.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
This is principally the description of the <CODE>gettext</CODE> interface. It
|
||
|
has an global domain which unqualified usages reference. Of course this
|
||
|
domain is selectable by the user.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
char *textdomain (const char *domain_name);
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
This provides the possibility to change or query the current status of
|
||
|
the current global domain of the <CODE>LC_MESSAGE</CODE> category. The
|
||
|
argument is a null-terminated string, whose characters must be legal in
|
||
|
the use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>,
|
||
|
the function return the current value. If no value has been set
|
||
|
before, the name of the default domain is returned: <EM>messages</EM>.
|
||
|
Please note that although the return value of <CODE>textdomain</CODE> is of
|
||
|
type <CODE>char *</CODE> no changing is allowed. It is also important to know
|
||
|
that no checks of the availability are made. If the name is not
|
||
|
available you will see this by the fact that no translations are provided.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
To use a domain set by <CODE>textdomain</CODE> the function
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
char *gettext (const char *msgid);
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
is to be used. This is the simplest reasonable form one can imagine.
|
||
|
The translation of the string <VAR>msgid</VAR> is returned if it is available
|
||
|
in the current domain. If not available the argument itself is
|
||
|
returned. If the argument is <CODE>NULL</CODE> the result is undefined.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
One things which should come into mind is that no explicit dependency to
|
||
|
the used domain is given. The current value of the domain for the
|
||
|
<CODE>LC_MESSAGES</CODE> locale is used. If this changes between two
|
||
|
executions of the same <CODE>gettext</CODE> call in the program, both calls
|
||
|
reference a different message catalog.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For the easiest case, which is normally used in internationalized
|
||
|
packages, once at the beginning of execution a call to <CODE>textdomain</CODE>
|
||
|
is issued, setting the domain to a unique name, normally the package
|
||
|
name. In the following code all strings which have to be translated are
|
||
|
filtered through the gettext function. That's all, the package speaks
|
||
|
your language.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC45" HREF="gettext_toc.html#TOC45">Solving Ambiguities</A></H3>
|
||
|
|
||
|
<P>
|
||
|
While this single name domain work good for most applications there
|
||
|
might be the need to get translations from more than one domain. Of
|
||
|
course one could switch between different domains with calls to
|
||
|
<CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A
|
||
|
possible situation could be one case discussing while this writing: all
|
||
|
error messages of functions in the set of common used functions should
|
||
|
go into a separate domain <CODE>error</CODE>. By this mean we would only need
|
||
|
to translate them once.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
For this reasons there are two more functions to retrieve strings:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
char *dgettext (const char *domain_name, const char *msgid);
|
||
|
char *dcgettext (const char *domain_name, const char *msgid,
|
||
|
int category);
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Both take an additional argument at the first place, which corresponds
|
||
|
to the argument of <CODE>textdomain</CODE>. The third argument of
|
||
|
<CODE>dcgettext</CODE> allows to use another locale but <CODE>LC_MESSAGES</CODE>.
|
||
|
But I really don't know where this can be useful. If the
|
||
|
<VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside
|
||
|
the known ones, the result is undefined. It should also be noted that
|
||
|
this function is not part of the second known implementation of this
|
||
|
function family, the one found in Solaris.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
A second ambiguity can arise by the fact, that perhaps more than one
|
||
|
domain has the same name. This can be solved by specifying where the
|
||
|
needed message catalog files can be found.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
char *bindtextdomain (const char *domain_name,
|
||
|
const char *dir_name);
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Calling this function binds the given domain to a file in the specified
|
||
|
directory (how this file is determined follows below). Especially a
|
||
|
file in the systems default place is not favored against the specified
|
||
|
file anymore (as it would be by solely using <CODE>textdomain</CODE>). A
|
||
|
<CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding
|
||
|
associated with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is
|
||
|
<CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned. Here
|
||
|
again as for all the other functions is true that none of the return
|
||
|
value must be changed!
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
It is important to remember that relative path names for the
|
||
|
<VAR>dir_name</VAR> parameter can be trouble. Since the path is always
|
||
|
computed relative to the current directory different results will be
|
||
|
achieved when the program executes a <CODE>chdir</CODE> command. Relative
|
||
|
paths should always be avoided to avoid dependencies and
|
||
|
unreliabilities.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC46" HREF="gettext_toc.html#TOC46">Locating Message Catalog Files</A></H3>
|
||
|
|
||
|
<P>
|
||
|
Because many different languages for many different packages have to be
|
||
|
stored we need some way to add these information to file message catalog
|
||
|
files. The way usually used in Unix environments is have this encoding
|
||
|
in the file name. This is also done here. The directory name given in
|
||
|
<CODE>bindtextdomain</CODE>s second argument (or the default directory),
|
||
|
followed by the value and name of the locale and the domain name are
|
||
|
concatenated:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
<VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
The default value for <VAR>dir_name</VAR> is system specific. For the GNU
|
||
|
library, and for packages adhering to its conventions, it's:
|
||
|
|
||
|
<PRE>
|
||
|
/usr/local/share/locale
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
<VAR>locale</VAR> is the value of the locale whose name is this
|
||
|
<CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this
|
||
|
locale is always <CODE>LC_MESSAGES</CODE>. <CODE>dcgettext</CODE> specifies the
|
||
|
locale by the third argument.<A NAME="DOCF2" HREF="gettext_foot.html#FOOT2">(2)</A> <A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A>
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC47" HREF="gettext_toc.html#TOC47">Optimization of the *gettext functions</A></H3>
|
||
|
|
||
|
<P>
|
||
|
At this point of the discussion we should talk about an advantage of the
|
||
|
GNU <CODE>gettext</CODE> implementation. Some readers might have pointed out
|
||
|
that an internationalized program might have a poor performance if some
|
||
|
string has to be translated in an inner loop. While this is unavoidable
|
||
|
when the string varies from one run of the loop to the other it is
|
||
|
simply a waste of time when the string is always the same. Take the
|
||
|
following example:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
{
|
||
|
while (...)
|
||
|
{
|
||
|
puts (gettext ("Hello world"));
|
||
|
}
|
||
|
}
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
When the locale selection does not change between two runs the resulting
|
||
|
string is always the same. One way to use this is:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
{
|
||
|
str = gettext ("Hello world");
|
||
|
while (...)
|
||
|
{
|
||
|
puts (str);
|
||
|
}
|
||
|
}
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
But this solution is not usable in all situation (e.g. when the locale
|
||
|
selection changes) nor is it good readable.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The GNU C compiler, version 2.7 and above, provide another solution for
|
||
|
this. To describe this we show here some lines of the
|
||
|
<TT>`intl/libgettext.h'</TT> file. For an explanation of the expression
|
||
|
command block see section `Statements and Declarations in Expressions' in <CITE>The GNU CC Manual</CITE>.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
# if defined __GNUC__ && __GNUC__ == 2 && __GNUC_MINOR__ >= 7
|
||
|
extern int _nl_msg_cat_cntr;
|
||
|
# define dcgettext(domainname, msgid, category) \
|
||
|
(__extension__ \
|
||
|
({ \
|
||
|
char *result; \
|
||
|
if (__builtin_constant_p (msgid)) \
|
||
|
{ \
|
||
|
static char *__translation__; \
|
||
|
static int __catalog_counter__; \
|
||
|
if (! __translation__ \
|
||
|
|| __catalog_counter__ != _nl_msg_cat_cntr) \
|
||
|
{ \
|
||
|
__translation__ = \
|
||
|
dcgettext__ ((domainname), (msgid), (category)); \
|
||
|
__catalog_counter__ = _nl_msg_cat_cntr; \
|
||
|
} \
|
||
|
result = __translation__; \
|
||
|
} \
|
||
|
else \
|
||
|
result = dcgettext__ ((domainname), (msgid), (category)); \
|
||
|
result; \
|
||
|
}))
|
||
|
# endif
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
The interesting thing here is the <CODE>__builtin_constant_p</CODE> predicate.
|
||
|
This is evaluated at compile time and so optimization can take place
|
||
|
immediately. Here two cases are distinguished: the argument to
|
||
|
<CODE>gettext</CODE> is not a constant value in which case simply the function
|
||
|
<CODE>dcgettext__</CODE> is called, the real implementation of the
|
||
|
<CODE>dcgettext</CODE> function.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
If the string argument <EM>is</EM> constant we can reuse the once gained
|
||
|
translation when the locale selection has not changed. This is exactly
|
||
|
what is done here. The <CODE>_nl_msg_cat_cntr</CODE> variable is defined in
|
||
|
the <TT>`loadmsgcat.c'</TT> which is available in <TT>`libintl.a'</TT> and is
|
||
|
changed whenever a new message catalog is loaded.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC48" HREF="gettext_toc.html#TOC48">Comparing the Two Interfaces</A></H2>
|
||
|
|
||
|
<P>
|
||
|
The following discussion is perhaps a little bit colored. As said
|
||
|
above we implemented GNU <CODE>gettext</CODE> following the Uniforum
|
||
|
proposal and this surely has its reasons. But it should show how we
|
||
|
came to this decision.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
First we take a look at the developing process. When we write an
|
||
|
application using NLS provided by <CODE>gettext</CODE> we proceed as always.
|
||
|
Only when we come to a string which might be seen by the users and thus
|
||
|
has to be translated we use <CODE>gettext("...")</CODE> instead of
|
||
|
<CODE>"..."</CODE>. At the beginning of each source file (or in a central
|
||
|
header file) we define
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
#define gettext(String) (String)
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Even this definition can be avoided when the system supports the
|
||
|
<CODE>gettext</CODE> function in its C library. When we compile this code the
|
||
|
result is the same as if no NLS code is used. When you take a look at
|
||
|
the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE>
|
||
|
instead of <CODE>gettext("...")</CODE>. This reduces the number of
|
||
|
additional characters per translatable string to <EM>3</EM> (in words:
|
||
|
three).
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
When now a production version of the program is needed we simply replace
|
||
|
the definition
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
#define _(String) (String)
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
by
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
#include <libintl.h>
|
||
|
#define _(String) gettext (String)
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Additionally we run the program <TT>`xgettext'</TT> on all source code file
|
||
|
which contain translatable strings and that's it: we have a running
|
||
|
program which does not depend on translations to be available, but which
|
||
|
can use any that becomes available.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
The same procedure can be done for the <CODE>gettext_noop</CODE> invocations
|
||
|
(see section <A HREF="gettext_3.html#SEC18">Special Cases of Translatable Strings</A>). First you can define <CODE>gettext_noop</CODE> to a
|
||
|
no-op macro and later use the definition from <TT>`libintl.h'</TT>. Because
|
||
|
this name is not used in Suns implementation of <TT>`libintl.h'</TT>,
|
||
|
you should consider the following code for your project:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
#ifdef gettext_noop
|
||
|
# define N_(String) gettext_noop (String)
|
||
|
#else
|
||
|
# define N_(String) (String)
|
||
|
#endif
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
<CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>`Makefile'</TT> in
|
||
|
the <TT>`po/'</TT> directory of GNU gettext knows by default both of the
|
||
|
mentioned short forms so you are invited to follow this proposal for
|
||
|
your own ease.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Now to <CODE>catgets</CODE>. The main problem is the work for the
|
||
|
programmer. Every time he comes to a translatable string he has to
|
||
|
define a number (or a symbolic constant) which has also be defined in
|
||
|
the message catalog file. He also has to take care for duplicate
|
||
|
entries, duplicate message IDs etc. If he wants to have the same
|
||
|
quality in the message catalog as the GNU <CODE>gettext</CODE> program
|
||
|
provides he also has to put the descriptive comments for the strings and
|
||
|
the location in all source code files in the message catalog. This is
|
||
|
nearly a Mission: Impossible.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
But there are also some points people might call advantages speaking for
|
||
|
<CODE>catgets</CODE>. If you have a single word in a string and this string
|
||
|
is used in different contexts it is likely that in one or the other
|
||
|
language the word has different translations. Example:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
printf ("%s: %d", gettext ("number"), number_of_errors)
|
||
|
|
||
|
printf ("you should see %d %s", number_count,
|
||
|
number_count == 1 ? gettext ("number") : gettext ("numbers"))
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
Here we have to translate two times the string <CODE>"number"</CODE>. Even
|
||
|
if you do not speak a language beside English it might be possible to
|
||
|
recognize that the two words have a different meaning. In German the
|
||
|
first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second
|
||
|
to <CODE>"Zahl"</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Now you can say that this example is really esoteric. And you are
|
||
|
right! This is exactly how we felt about this problem and decide that
|
||
|
it does not weight that much. The solution for the above problem could
|
||
|
be very easy:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
printf ("%s %d", gettext ("number:"), number_of_errors)
|
||
|
|
||
|
printf (number_count == 1 ? gettext ("you should see %d number")
|
||
|
: gettext ("you should see %d numbers"),
|
||
|
number_count)
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
We believe that we can solve all conflicts with this method. If it is
|
||
|
difficult one can also consider changing one of the conflicting string a
|
||
|
little bit. But it is not impossible to overcome.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Translator note: It is perhaps appropriate here to tell those English
|
||
|
speaking programmers that the plural form of a noun cannot be formed by
|
||
|
appending a single `s'. Most other languages use different methods.
|
||
|
Even the above form is not general enough to cope with all languages.
|
||
|
Rafal Maszkowski <rzm@mat.uni.torun.pl> reports:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<BLOCKQUOTE>
|
||
|
<P>
|
||
|
In Polish we use e.g. plik (file) this way:
|
||
|
|
||
|
<PRE>
|
||
|
1 plik
|
||
|
2,3,4 pliki
|
||
|
5-21 pliko'w
|
||
|
22-24 pliki
|
||
|
25-31 pliko'w
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
and so on (o' means 8859-2 oacute which should be rather okreska,
|
||
|
similar to aogonek).
|
||
|
</BLOCKQUOTE>
|
||
|
|
||
|
<P>
|
||
|
A workable approach might be to consider methods like the one used for
|
||
|
<CODE>LC_TIME</CODE> in the POSIX.2 standard. The value of the
|
||
|
<CODE>alt_digits</CODE> field can be up to 100 strings which represent the
|
||
|
numbers 1 to 100. Using this in a situation of an internationalized
|
||
|
program means that an array of translatable strings should be indexed by
|
||
|
the number which should represent. A small example:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<PRE>
|
||
|
void
|
||
|
print_month_info (int month)
|
||
|
{
|
||
|
const char *month_pos[12] =
|
||
|
{ N_("first"), N_("second"), N_("third"), N_("fourth"),
|
||
|
N_("fifth"), N_("sixth"), N_("seventh"), N_("eighth"),
|
||
|
N_("ninth"), N_("tenth"), N_("eleventh"), N_("twelfth") };
|
||
|
printf (_("%s is the %s month\n"), nl_langinfo (MON_1 + month),
|
||
|
_(month_pos[month]));
|
||
|
}
|
||
|
</PRE>
|
||
|
|
||
|
<P>
|
||
|
It should be obvious that this method is only reasonable for small
|
||
|
ranges of numbers.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC49" HREF="gettext_toc.html#TOC49">Using libintl.a in own programs</A></H2>
|
||
|
|
||
|
<P>
|
||
|
Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be
|
||
|
self-contained. I.e., you can use it in your own programs without
|
||
|
providing additional functions. The <TT>`Makefile'</TT> will put the header
|
||
|
and the library in directories selected using the <CODE>$(prefix)</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
One exception of the above is found on HP-UX systems. Here the C library
|
||
|
does not contain the <CODE>alloca</CODE> function (and the HP compiler does
|
||
|
not generate it inlined). But it is not intended to rewrite the whole
|
||
|
library just because of this dumb system. Instead include the
|
||
|
<CODE>alloca</CODE> function in all package you use the <CODE>libintl.a</CODE> in.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC50" HREF="gettext_toc.html#TOC50">Being a <CODE>gettext</CODE> grok</A></H2>
|
||
|
|
||
|
<P>
|
||
|
To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it
|
||
|
is surely helpful to read the source code. But for those who don't want
|
||
|
to spend that much time in reading the (sometimes complicated) code here
|
||
|
is a list comments:
|
||
|
|
||
|
</P>
|
||
|
|
||
|
<UL>
|
||
|
<LI>Changing the language at runtime
|
||
|
|
||
|
For interactive programs it might be useful to offer a selection of the
|
||
|
used language at runtime. To understand how to do this one need to know
|
||
|
how the used language is determined while executing the <CODE>gettext</CODE>
|
||
|
function. The method which is presented here only works correctly
|
||
|
with the GNU implementation of the <CODE>gettext</CODE> functions. It is not
|
||
|
possible with underlying <CODE>catgets</CODE> functions or <CODE>gettext</CODE>
|
||
|
functions from the systems C library. The exception is of course the
|
||
|
GNU C Library which uses the GNU <CODE>gettext</CODE> Library for message handling.
|
||
|
|
||
|
In the function <CODE>dcgettext</CODE> at every call the current setting of
|
||
|
the highest priority environment variable is determined and used.
|
||
|
Highest priority means here the following list with decreasing
|
||
|
priority:
|
||
|
|
||
|
|
||
|
<OL>
|
||
|
<LI><CODE>LANGUAGE</CODE>
|
||
|
|
||
|
<LI><CODE>LC_ALL</CODE>
|
||
|
|
||
|
<LI><CODE>LC_xxx</CODE>, according to selected locale
|
||
|
|
||
|
<LI><CODE>LANG</CODE>
|
||
|
|
||
|
</OL>
|
||
|
|
||
|
Afterwards the path is constructed using the found value and the
|
||
|
translation file is loaded if available.
|
||
|
|
||
|
What is now when the value for, say, <CODE>LANGUAGE</CODE> changes. According
|
||
|
to the process explained above the new value of this variable is found
|
||
|
as soon as the <CODE>dcgettext</CODE> function is called. But this also means
|
||
|
the (perhaps) different message catalog file is loaded. In other
|
||
|
words: the used language is changed.
|
||
|
|
||
|
But there is one little hook. The code for gcc-2.7.0 and up provides
|
||
|
some optimization. This optimization normally prevents the calling of
|
||
|
the <CODE>dcgettext</CODE> function as long as no new catalog is loaded. But
|
||
|
if <CODE>dcgettext</CODE> is not called the program also cannot find the
|
||
|
<CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_8.html#SEC47">Optimization of the *gettext functions</A>). A
|
||
|
solution for this is very easy. Include the following code in the
|
||
|
language switching function.
|
||
|
|
||
|
|
||
|
<PRE>
|
||
|
/* Change language. */
|
||
|
setenv ("LANGUAGE", "fr", 1);
|
||
|
|
||
|
/* Make change known. */
|
||
|
{
|
||
|
extern int _nl_msg_cat_cntr;
|
||
|
++_nl_msg_cat_cntr;
|
||
|
}
|
||
|
</PRE>
|
||
|
|
||
|
The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>`loadmsgcat.c'</TT>.
|
||
|
The programmer will find himself in need for a construct like this only
|
||
|
when developing programs which do run longer and provide the user to
|
||
|
select the language at runtime. Non-interactive programs (like all
|
||
|
these little Unix tools) should never need this.
|
||
|
|
||
|
</UL>
|
||
|
|
||
|
|
||
|
|
||
|
<H2><A NAME="SEC51" HREF="gettext_toc.html#TOC51">Temporary Notes for the Programmers Chapter</A></H2>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC52" HREF="gettext_toc.html#TOC52">Temporary - Two Possible Implementations</A></H3>
|
||
|
|
||
|
<P>
|
||
|
There are two competing methods for language independent messages:
|
||
|
the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE>
|
||
|
method. The <CODE>catgets</CODE> method indexes messages by integers; the
|
||
|
<CODE>gettext</CODE> method indexes them by their English translations.
|
||
|
The <CODE>catgets</CODE> method has been around longer and is supported
|
||
|
by more vendors. The <CODE>gettext</CODE> method is supported by Sun,
|
||
|
and it has been heard that the COSE multi-vendor initiative is
|
||
|
supporting it. Neither method is a POSIX standard; the POSIX.1
|
||
|
committee had a lot of disagreement in this area.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Neither one is in the POSIX standard. There was much disagreement
|
||
|
in the POSIX.1 committee about using the <CODE>gettext</CODE> routines
|
||
|
vs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't
|
||
|
agree on anything, so no messaging system was included as part
|
||
|
of the standard. I believe the informative annex of the standard
|
||
|
includes the XPG3 messaging interfaces, "...as an example of
|
||
|
a messaging system that has been implemented..."
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
They were very careful not to say anywhere that you should use one
|
||
|
set of interfaces over the other. For more on this topic please
|
||
|
see the Programming for Internationalization FAQ.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC53" HREF="gettext_toc.html#TOC53">Temporary - About <CODE>catgets</CODE></A></H3>
|
||
|
|
||
|
<P>
|
||
|
There have been a few discussions of late on the use of
|
||
|
<CODE>catgets</CODE> as a base. I think it important to present both
|
||
|
sides of the argument and hence am opting to play devil's advocate
|
||
|
for a little bit.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
I'll not deny the fact that <CODE>catgets</CODE> could have been designed
|
||
|
a lot better. It currently has quite a number of limitations and
|
||
|
these have already been pointed out.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
However there is a great deal to be said for consistency and
|
||
|
standardization. A common recurring problem when writing Unix
|
||
|
software is the myriad portability problems across Unix platforms.
|
||
|
It seems as if every Unix vendor had a look at the operating system
|
||
|
and found parts they could improve upon. Undoubtedly, these
|
||
|
modifications are probably innovative and solve real problems.
|
||
|
However, software developers have a hard time keeping up with all
|
||
|
these changes across so many platforms.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
And this has prompted the Unix vendors to begin to standardize their
|
||
|
systems. Hence the impetus for Spec1170. Every major Unix vendor
|
||
|
has committed to supporting this standard and every Unix software
|
||
|
developer waits with glee the day they can write software to this
|
||
|
standard and simply recompile (without having to use autoconf)
|
||
|
across different platforms.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
As I understand it, Spec1170 is roughly based upon version 4 of the
|
||
|
X/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and
|
||
|
friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE>
|
||
|
is a part of Spec1170 and hence will become a standardized component
|
||
|
of all Unix systems.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC54" HREF="gettext_toc.html#TOC54">Temporary - Why a single implementation</A></H3>
|
||
|
|
||
|
<P>
|
||
|
Now it seems kind of wasteful to me to have two different systems
|
||
|
installed for accessing message catalogs. If we do want to remedy
|
||
|
<CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE>
|
||
|
(in a compatible manner) rather than implement an entirely new system.
|
||
|
Otherwise, we'll end up with two message catalog access systems installed
|
||
|
with an operating system - one set of routines for packages using GNU
|
||
|
<CODE>gettext</CODE> for their internationalization, and another set of routines
|
||
|
(catgets) for all other software. Bloated?
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Supposing another catalog access system is implemented. Which do
|
||
|
we recommend? At least for Linux, we need to attract as many
|
||
|
software developers as possible. Hence we need to make it as easy
|
||
|
for them to port their software as possible. Which means supporting
|
||
|
<CODE>catgets</CODE>. We will be implementing the <CODE>glocale</CODE> code
|
||
|
within our <CODE>libc</CODE>, but does this mean we also have to incorporate
|
||
|
another message catalog access scheme within our <CODE>libc</CODE> as well?
|
||
|
And what about people who are going to be using the <CODE>glocale</CODE>
|
||
|
+ non-<CODE>catgets</CODE> routines. When they port their software to
|
||
|
other platforms, they're now going to have to include the front-end
|
||
|
(<CODE>glocale</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE>
|
||
|
access routines) with their software instead of just including the
|
||
|
<CODE>glocale</CODE> code with their software.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Message catalog support is however only the tip of the iceberg.
|
||
|
What about the data for the other locale categories. They also have
|
||
|
a number of deficiencies. Are we going to abandon them as well and
|
||
|
develop another duplicate set of routines (should <CODE>glocale</CODE>
|
||
|
expand beyond message catalog support)?
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
Like many parts of Unix that can be improved upon, we're stuck with balancing
|
||
|
compatibility with the past with useful improvements and innovations for
|
||
|
the future.
|
||
|
|
||
|
</P>
|
||
|
|
||
|
|
||
|
|
||
|
<H3><A NAME="SEC55" HREF="gettext_toc.html#TOC55">Temporary - Notes</A></H3>
|
||
|
|
||
|
<P>
|
||
|
X/Open agreed very late on the standard form so that many
|
||
|
implementations differ from the final form. Both of my system (old
|
||
|
Linux catgets and Ultrix-4) have a strange variation.
|
||
|
|
||
|
</P>
|
||
|
<P>
|
||
|
OK. After incorporating the last changes I have to spend some time on
|
||
|
making the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions. So in future
|
||
|
Solaris is not the only system having <CODE>gettext</CODE>.
|
||
|
|
||
|
</P>
|
||
|
<P><HR><P>
|
||
|
<p>Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_12.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
|
||
|
</BODY>
|
||
|
</HTML>
|