<!-- This HTML file has been created by texi2html 1.54
from gettext.texi on 25 January 1999 -->
<TITLE>GNU gettext utilities - PO Files and PO Mode Basics</TITLE>
<linkhref="gettext_3.html"rel=Next>
<linkhref="gettext_1.html"rel=Previous>
<linkhref="gettext_toc.html"rel=ToC>
</HEAD>
<BODY>
<p>Go to the <AHREF="gettext_1.html">first</A>, <AHREF="gettext_1.html">previous</A>, <AHREF="gettext_3.html">next</A>, <AHREF="gettext_12.html">last</A> section, <AHREF="gettext_toc.html">table of contents</A>.
<P><HR><P>
<H1><ANAME="SEC7"HREF="gettext_toc.html#TOC7">PO Files and PO Mode Basics</A></H1>
<P>
The GNU <CODE>gettext</CODE> toolset helps programmers and translators
at producing, updating and using translation files, mainly those
PO files which are textual, editable files. This chapter stresses
the format of PO files, and contains a PO mode starter. PO mode
description is spread throughout this manual instead of being concentrated
in one place. Here we present only the basics of PO mode.
</P>
<H2><ANAME="SEC8"HREF="gettext_toc.html#TOC8">Completing GNU <CODE>gettext</CODE> Installation</A></H2>
<P>
Once you have received, unpacked, configured and compiled the GNU
<CODE>gettext</CODE> distribution, the <SAMP>`make install'</SAMP> command puts in
place the programs <CODE>xgettext</CODE>, <CODE>msgfmt</CODE>, <CODE>gettext</CODE>, and
<CODE>msgmerge</CODE>, as well as their available message catalogs. To
top off a comfortable installation, you might also want to make the
PO mode available to your GNU Emacs users.
</P>
<P>
During the installation of the PO mode, you might want modify your
file <TT>`.emacs'</TT>, once and for all, so it contains a few lines looking
to write multi-line strings, one should not use escaped newlines.
Instead, a closing quote should follow the last character on the
line to be continued, and an opening quote should resume the string
at the beginning of the following PO file line. For example:
</P>
<PRE>
msgid ""
"Here is an example of how one might continue a very long string\n"
"for the common case the string represents multi-line output.\n"
</PRE>
<P>
In this example, the empty string is used on the first line, to
allow better alignment of the <KBD>H</KBD> from the word <SAMP>`Here'</SAMP>
over the <KBD>f</KBD> from the word <SAMP>`for'</SAMP>. In this example, the
<CODE>msgid</CODE> keyword is followed by three strings, which are meant
to be concatenated. Concatenating the empty string does not change
the resulting overall string, but it is a way for us to comply with
the necessity of <CODE>msgid</CODE> to be followed by a string on the same
line, while keeping the multi-line presentation left-justified, as
we find this to be a cleaner disposition. The empty string could have
been omitted, but only if the string starting with <SAMP>`Here'</SAMP> was
promoted on the first line, right after <CODE>msgid</CODE>.<ANAME="DOCF1"HREF="gettext_foot.html#FOOT1">(1)</A> It was not really necessary
either to switch between the two last quoted strings immediately after
the newline <SAMP>`\n'</SAMP>, the switch could have occurred after <EM>any</EM>
other character, we just did it this way because it is neater.
</P>
<P>
One should carefully distinguish between end of lines marked as
<SAMP>`\n'</SAMP><EM>inside</EM> quotes, which are part of the represented
string, and end of lines in the PO file itself, outside string quotes,
which have no incidence on the represented string.
</P>
<P>
Outside strings, white lines and comments may be used freely.
Comments start at the beginning of a line with <SAMP>`#'</SAMP> and extend
until the end of the PO file line. Comments written by translators
should have the initial <SAMP>`#'</SAMP> immediately followed by some white
space. If the <SAMP>`#'</SAMP> is not immediately followed by white space,
this comment is most likely generated and managed by specialized GNU
tools, and might disappear or be replaced unexpectedly when the PO
file is given to <CODE>msgmerge</CODE>.
</P>
<H2><ANAME="SEC10"HREF="gettext_toc.html#TOC10">Main PO mode Commands</A></H2>
<P>
After setting up Emacs with something similar to the lines in
section <AHREF="gettext_2.html#SEC8">Completing GNU <CODE>gettext</CODE> Installation</A>, PO mode is activated for a window when Emacs finds a
PO file in that window. This puts the window read-only and establishes a
po-mode-map, which is a genuine Emacs mode, in a way that is not derived
from text mode in any way. Functions found on <CODE>po-mode-hook</CODE>,
if any, will be executed.
</P>
<P>
When PO mode is active in a window, the letters <SAMP>`PO'</SAMP> appear
in the mode line for that window. The mode line also displays how
many entries of each kind are held in the PO file. For example,
the string <SAMP>`132t+3f+10u+2o'</SAMP> would tell the translator that the
PO mode contains 132 translated entries (see section <AHREF="gettext_5.html#SEC25">Translated Entries</A>,
3 fuzzy entries (see section <AHREF="gettext_5.html#SEC26">Fuzzy Entries</A>), 10 untranslated entries
(see section <AHREF="gettext_5.html#SEC27">Untranslated Entries</A>) and 2 obsolete entries (see section <AHREF="gettext_5.html#SEC28">Obsolete Entries</A>). Zero-coefficients items are not shown. So, in this example, if
the fuzzy entries were unfuzzied, the untranslated entries were translated
and the obsolete entries were deleted, the mode line would merely display
<SAMP>`145t'</SAMP> for the counters.
</P>
<P>
The main PO commands are those which do not fit into the other categories of
subsequent sections. These allow for quitting PO mode or for managing windows
in special ways.
</P>
<DLCOMPACT>
<DT><KBD>U</KBD>
<DD>
Undo last modification to the PO file.
<DT><KBD>Q</KBD>
<DD>
Quit processing and save the PO file.
<DT><KBD>q</KBD>
<DD>
Quit processing, possibly after confirmation.
<DT><KBD>O</KBD>
<DD>
Temporary leave the PO file window.
<DT><KBD>?</KBD>
<DD>
<DT><KBD>h</KBD>
<DD>
Show help about PO mode.
<DT><KBD>=</KBD>
<DD>
Give some PO file statistics.
<DT><KBD>V</KBD>
<DD>
Batch validate the format of the whole PO file.
</DL>
<P>
The command <KBD>U</KBD> (<CODE>po-undo</CODE>) interfaces to the GNU Emacs
<EM>undo</EM> facility. See section `Undoing Changes' in <CITE>The Emacs Editor</CITE>. Each time <KBD>U</KBD> is typed, modifications which the translator
did to the PO file are undone a little more. For the purpose of
undoing, each PO mode command is atomic. This is especially true for
the <KBD><KBD>RET</KBD></KBD> command: the whole edition made by using a single
use of this command is undone at once, even if the edition itself
implied several actions. However, while in the editing window, one
can undo the edition work quite parsimoniously.
</P>
<P>
The commands <KBD>Q</KBD> (<CODE>po-quit</CODE>) and <KBD>q</KBD>
(<CODE>po-confirm-and-quit</CODE>) are used when the translator is done with the
PO file. The former is a bit less verbose than the latter. If the file
has been modified, it is saved to disk first. In both cases, and prior to
all this, the commands check if some untranslated message remains in the
PO file and, if yes, the translator is asked if she really wants to leave
off working with this PO file. This is the preferred way of getting rid
of an Emacs PO file buffer. Merely killing it through the usual command
<KBD>C-x k</KBD> (<CODE>kill-buffer</CODE>) is not the tidiest way to proceed.
</P>
<P>
The command <KBD>O</KBD> (<CODE>po-other-window</CODE>) is another, softer way,
to leave PO mode, temporarily. It just moves the cursor to some other
Emacs window, and pops one if necessary. For example, if the translator
just got PO mode to show some source context in some other, she might
discover some apparent bug in the program source that needs correction.
This command allows the translator to change sex, become a programmer,
and have the cursor right into the window containing the program she
(or rather <EM>he</EM>) wants to modify. By later getting the cursor back
in the PO file window, or by asking Emacs to edit this file once again,
PO mode is then recovered.
</P>
<P>
The command <KBD>h</KBD> (<CODE>po-help</CODE>) displays a summary of all available PO
mode commands. The translator should then type any character to resume
normal PO mode operations. The command <KBD>?</KBD> has the same effect
as <KBD>h</KBD>.
</P>
<P>
The command <KBD>=</KBD> (<CODE>po-statistics</CODE>) computes the total number of
entries in the PO file, the ordinal of the current entry (counted from
1), the number of untranslated entries, the number of obsolete entries,
and displays all these numbers.
</P>
<P>
The command <KBD>V</KBD> (<CODE>po-validate</CODE>) launches <CODE>msgfmt</CODE> in verbose
mode over the current PO file. This command first offers to save the
current PO file on disk. The <CODE>msgfmt</CODE> tool, from GNU <CODE>gettext</CODE>,
has the purpose of creating a MO file out of a PO file, and PO mode uses
the features of this program for checking the overall format of a PO file,
as well as all individual entries.
</P>
<P>
The program <CODE>msgfmt</CODE> runs asynchronously with Emacs, so the
translator regains control immediately while her PO file is being studied.
Error output is collected in the GNU Emacs <SAMP>`*compilation*'</SAMP> buffer,
displayed in another window. The regular GNU Emacs command <KBD>C-x`</KBD>
(<CODE>next-error</CODE>), as well as other usual compile commands, allow the
translator to reposition quickly to the offending parts of the PO file.
Once the cursor is on the line in error, the translator may decide on
any PO mode action which would help correcting the error.
the ability for PO mode to scan an already existing PO file for a
particular string encoded into the <CODE>msgid</CODE> field of some entry.
Even if PO mode has internally all the built-in machinery for
implementing this recognition easily, doing it fast is technically
difficult. To facilitate a solution to this efficiency problem,
we decided on a canonical representation for strings.
</P>
<P>
A conventional representation of strings in a PO file is currently
under discussion, and PO mode experiments with a canonical representation.
Having both <CODE>xgettext</CODE> and PO mode converging towards a uniform
way of representing equivalent strings would be useful, as the internal
normalization needed by PO mode could be automatically satisfied
when using <CODE>xgettext</CODE> from GNU <CODE>gettext</CODE>. An explicit
PO mode normalization should then be only necessary for PO files
imported from elsewhere, or for when the convention itself evolves.
</P>
<P>
So, for achieving normalization of at least the strings of a given
PO file needing a canonical representation, the following PO mode
command is available:
</P>
<DLCOMPACT>
<DT><KBD>M-x po-normalize</KBD>
<DD>
Tidy the whole PO file by making entries more uniform.
</DL>
<P>
The special command <KBD>M-x po-normalize</KBD>, which has no associate
keys, revises all entries, ensuring that strings of both original
and translated entries use uniform internal quoting in the PO file.
It also removes any crumb after the last entry. This command may be
useful for PO files freshly imported from elsewhere, or if we ever
improve on the canonical quoting format we use. This canonical format
is not only meant for getting cleaner PO files, but also for greatly
speeding up <CODE>msgid</CODE> string lookup for some other PO mode commands.
</P>
<P>
<KBD>M-x po-normalize</KBD> presently makes three passes over the entries.
The first implements heuristics for converting PO files for GNU
<CODE>gettext</CODE> 0.6 and earlier, in which <CODE>msgid</CODE> and <CODE>msgstr</CODE>
fields were using K&R style C string syntax for multi-line strings.
These heuristics may fail for comments not related to obsolete
entries and ending with a backslash; they also depend on subsequent
passes for finalizing the proper commenting of continued lines for
obsolete entries. This first pass might disappear once all oldish PO
files would have been adjusted. The second and third pass normalize
all <CODE>msgid</CODE> and <CODE>msgstr</CODE> strings respectively. They also
clean out those trailing backslashes used by XView's <CODE>msgfmt</CODE>
for continued lines.
</P>
<P>
Having such an explicit normalizing command allows for importing PO
files from other sources, but also eases the evolution of the current
convention, evolution driven mostly by aesthetic concerns, as of now.
It is easy to make suggested adjustments at a later time, as the
normalizing command and eventually, other GNU <CODE>gettext</CODE> tools
should greatly automate conformance. A description of the canonical
string format is given below, for the particular benefit of those not
having GNU Emacs handy, and who would nevertheless want to handcraft
their PO files in nice ways.
</P>
<P>
Right now, in PO mode, strings are single line or multi-line. A string
goes multi-line if and only if it has <EM>embedded</EM> newlines, that
is, if it matches <SAMP>`[^\n]\n+[^\n]'</SAMP>. So, we would have:
</P>
<PRE>
msgstr "\n\nHello, world!\n\n\n"
</PRE>
<P>
but, replacing the space by a newline, this becomes:
</P>
<PRE>
msgstr ""
"\n"
"\n"
"Hello,\n"
"world!\n"
"\n"
"\n"
</PRE>
<P>
We are deliberately using a caricatural example, here, to make the
point clearer. Usually, multi-lines are not that bad looking.
It is probable that we will implement the following suggestion.
We might lump together all initial newlines into the empty string,
and also all newlines introducing empty lines (that is, for <VAR>n</VAR>
> 1, the <VAR>n</VAR>-1'th last newlines would go together on a separate
string), so making the previous example appear:
</P>
<PRE>
msgstr "\n\n"
"Hello,\n"
"world!\n"
"\n\n"
</PRE>
<P>
There are a few yet undecided little points about string normalization,
to be documented in this manual, once these questions settle.
</P>
<P><HR><P>
<p>Go to the <AHREF="gettext_1.html">first</A>, <AHREF="gettext_1.html">previous</A>, <AHREF="gettext_3.html">next</A>, <AHREF="gettext_12.html">last</A> section, <AHREF="gettext_toc.html">table of contents</A>.