<!-- This HTML file has been created by texi2html 1.54
from gettext.texi on 25 January 1999 -->
<TITLE>GNU gettext utilities - Preparing Program Sources</TITLE>
<linkhref="gettext_4.html"rel=Next>
<linkhref="gettext_2.html"rel=Previous>
<linkhref="gettext_toc.html"rel=ToC>
</HEAD>
<BODY>
<p>Go to the <AHREF="gettext_1.html">first</A>, <AHREF="gettext_2.html">previous</A>, <AHREF="gettext_4.html">next</A>, <AHREF="gettext_12.html">last</A> section, <AHREF="gettext_toc.html">table of contents</A>.
<P><HR><P>
<H1><ANAME="SEC13"HREF="gettext_toc.html#TOC13">Preparing Program Sources</A></H1>
<P>
For the programmer, changes to the C source code fall into three
categories. First, you have to make the localization functions
known to all modules needing message translation. Second, you should
properly trigger the operation of GNU <CODE>gettext</CODE> when the program
initializes, usually from the <CODE>main</CODE> function. Last, you should
identify and especially mark all constant strings in your program
needing translation.
</P>
<P>
Presuming that your set of programs, or package, has been adjusted
so all needed GNU <CODE>gettext</CODE> files are available, and your
<TT>`Makefile'</TT> files are adjusted (see section <AHREF="gettext_10.html#SEC67">The Maintainer's View</A>), each C module
having translated C strings should contain the line:
</P>
<PRE>
#include <libintl.h>
</PRE>
<P>
The remaining changes to your C sources are discussed in the further
In C programs strings are often used within calls of functions from the
<CODE>printf</CODE> family. The special thing about these format strings is
that they can contain format specifiers introduced with <KBD>%</KBD>. Assume
we have the code
</P>
<PRE>
printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
</PRE>
<P>
A possible German translation for the above string might be:
</P>
<PRE>
"%d Zeichen lang ist die Zeichenkette `%s'"
</PRE>
<P>
A C programmer, even if he cannot speak German, will recognize that
there is something wrong here. The order of the two format specifiers
is changed but of course the arguments in the <CODE>printf</CODE> don't have.
This will most probably lead to problems because now the length of the
string is regarded as the address.
</P>
<P>
To prevent errors at runtime caused by translations the <CODE>msgfmt</CODE>
tool can check statically whether the arguments in the original and the
translation string match in type and number. If this is not the case a
warning will be given and the error cannot causes problems at runtime.
</P>
<P>
If the word order in the above German translation would be correct one
would have to write
</P>
<PRE>
"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
</PRE>
<P>
The routines in <CODE>msgfmt</CODE> know about this special notation.
</P>
<P>
Because not all strings in a program must be format strings it is not
useful for <CODE>msgfmt</CODE> to test all the strings in the <TT>`.po'</TT> file.
This might cause problems because the string might contain what looks
like a format specifier, but the string is not used in <CODE>printf</CODE>.
</P>
<P>
Therefore the <CODE>xgettext</CODE> adds a special tag to those messages it
thinks might be a format string. There is no absolute rule for this,
only a heuristic. In the <TT>`.po'</TT> file the entry is marked using the
<CODE>c-format</CODE> flag in the <KBD>#,</KBD> comment line (see section <AHREF="gettext_2.html#SEC9">The Format of PO Files</A>).
</P>
<P>
The careful reader now might say that this again can cause problems.
The heuristic might guess it wrong. This is true and therefore
<CODE>xgettext</CODE> knows about special kind of comment which lets
the programmer take over the decision. If in the same line or
the immediately preceding line of the <CODE>gettext</CODE> keyword
the <CODE>xgettext</CODE> program find a comment containing the words
<KBD>xgettext:c-format</KBD> it will mark the string in any case with
the <KBD>c-format</KBD> flag. This kind of comment should be used when
<CODE>xgettext</CODE> does not recognize the string as a format string but
is really is one and it should be tested. Please note that when the
comment is in the same line of the <CODE>gettext</CODE> keyword, it must be
before the string to be translated.
</P>
<P>
This situation happens quite often. The <CODE>printf</CODE> function is often
called with strings which do not contain a format specifier. Of course
one would normally use <CODE>fputs</CODE> but it does happen. In this case
<CODE>xgettext</CODE> does not recognize this as a format string but what
happens if the translation introduces a valid format specifier? The
<CODE>printf</CODE> function will try to access one of the parameter but none
exists because the original code does not refer to any parameter.
</P>
<P>
<CODE>xgettext</CODE> of course could make a wrong decision the other way
round. A string marked as a format string is not really a format
string. In this case the <CODE>msgfmt</CODE> might give too many warnings and
would prevent translating the <TT>`.po'</TT> file. The method to prevent
this wrong decision is similar to the one used above, only the comment
to use must contain the string <KBD>xgettext:no-c-format</KBD>.
</P>
<P>
If a string is marked with <KBD>c-format</KBD> and this is not correct the
user can find out who is responsible for the decision. See section <AHREF="gettext_4.html#SEC20">Invoking the <CODE>xgettext</CODE> Program</A> to see how the <KBD>--debug</KBD> option can be used for solving
this problem.
</P>
<H2><ANAME="SEC18"HREF="gettext_toc.html#TOC18">Special Cases of Translatable Strings</A></H2>
<P>
The attentive reader might now point out that it is not always possible
to mark translatable string with <CODE>gettext</CODE> or something like this.
Consider the following case:
</P>
<PRE>
{
static const char *messages[] = {
"some very meaningful message",
"and another one"
};
const char *string;
...
string
= index > 1 ? "a default message" : messages[index];
fputs (string);
...
}
</PRE>
<P>
While it is no problem to mark the string <CODE>"a default message"</CODE> it
is not possible to mark the string initializers for <CODE>messages</CODE>.
But this has some drawbacks. First the programmer has to take care that
he uses <CODE>gettext_noop</CODE> for the string <CODE>"a default message"</CODE>.
A use of <CODE>gettext</CODE> could have in rare cases unpredictable results.
The second reason is found in the internals of the GNU <CODE>gettext</CODE>
Library which will make this solution less efficient.
</P>
<P>
One advantage is that you need not make control flow analysis to make
sure the output is really translated in any case. But this analysis is
generally not very difficult. If it should be in any situation you can
use this second method in this situation.
</P>
<P><HR><P>
<p>Go to the <AHREF="gettext_1.html">first</A>, <AHREF="gettext_2.html">previous</A>, <AHREF="gettext_4.html">next</A>, <AHREF="gettext_12.html">last</A> section, <AHREF="gettext_toc.html">table of contents</A>.