GNU gettext utilities-程序员宅基地

技术标签: GNU gettext  GNU gettext utilitie  

Table of Contents

Next: Introduction, Previous: (dir), Up: (dir)   [Contents][Index]

GNU gettext utilities

This manual documents the GNU gettext tools and the GNU libintl library,version 0.19.8.


Next: Users, Previous: Top, Up: Top   [Contents][Index]

1 Introduction

This chapter explains the goals sought in the creationof GNU gettext and the free Translation Project.Then, it explains a few broad concepts aroundNative Language Support, and positions message translation with regardto other aspects of national and cultural variance, as they applyto programs. It also surveys those files used to convey thetranslations. It explains how the various tools interact in theinitial generation of these files, and later, how the maintenancecycle should usually operate.

In this manual, we use he when speaking of the programmer ormaintainer, she when speaking of the translator, and theywhen speaking of the installers or end users of the translated program.This is only a convenience for clarifying the documentation. It isabsolutely not meant to imply that some roles are more appropriateto males or females. Besides, as you might guess, GNU gettextis meant to be useful for people using computers, whatever their sex,race, religion or nationality!

Please send suggestions and corrections to:

Internet address:
    [email protected]

Please include the manual’s edition number and update date in your messages.


1.1 The Purpose of GNU gettext

Usually, programs are written and documented in English, and useEnglish at execution time to interact with users. This is truenot only of GNU software, but also of a great deal of proprietaryand free software. Using a common language is quite handy forcommunication between developers, maintainers and users from allcountries. On the other hand, most people are less comfortable withEnglish than with their own native language, and would prefer touse their mother tongue for day to day’s work, as far as possible.Many would simply love to see their computer screen showinga lot less of English, and far more of their own language.

However, to many people, this dream might appear so far fetched thatthey may believe it is not even worth spending time thinking aboutit. They have no confidence at all that the dream might everbecome true. Yet some have not lost hope, and have organized themselves.The Translation Project is a formalization of this hope into aworkable structure, which has a good chance to get all of us nearerthe achievement of a truly multi-lingual set of programs.

GNU gettext is an important step for the Translation Project,as it is an asset on which we may build many other steps. This packageoffers to programmers, translators and even users, a well integratedset of tools and documentation. Specifically, the GNU gettextutilities are a set of tools that provides a framework within whichother free packages may produce multi-lingual messages. These toolsinclude

  • A set of conventions about how programs should be written to supportmessage catalogs.
  • A directory and file naming organization for the message catalogsthemselves.
  • A runtime library supporting the retrieval of translated messages.
  • A few stand-alone programs to massage in various ways the sets oftranslatable strings, or already translated strings.
  • A library supporting the parsing and creation of files containingtranslated messages.
  • A special mode for Emacs1 which helps preparing these setsand bringing them up to date.

GNU gettext is designed to minimize the impact ofinternationalization on program sources, keeping this impact as smalland hardly noticeable as possible. Internationalization has betterchances of succeeding if it is very light weighted, or at least,appear to be so, when looking at program sources.

The Translation Project also uses the GNU gettext distributionas a vehicle for documenting its structure and methods. This goesbeyond the strict technicalities of documenting the GNU gettextproper. By so doing, translators will find in a single place, asfar as possible, all they need to know for properly doing theirtranslating work. Also, this supplemental documentation might alsohelp programmers, and even curious users, in understanding how GNUgettext is related to the remainder of the TranslationProject, and consequently, have a glimpse at the big picture.


Next: Aspects, Previous: Why, Up: Introduction   [Contents][Index]

1.2 I18n, L10n, and Such

Two long words appear all the time when we discuss support of nativelanguage in programs, and these words have a precise meaning, worthbeing explained here, once and for all in this document. The words areinternationalization and localization. Many people,tired of writing these long words over and over again, took thehabit of writing i18n and l10n instead, quoting the firstand last letter of each word, and replacing the run of intermediateletters by a number merely telling how many such letters there are.But in this manual, in the sake of clarity, we will patiently writethe names in full, each time…

By internationalization, one refers to the operation by which aprogram, or a set of programs turned into a package, is made aware of andable to support multiple languages. This is a generalization process,by which the programs are untied from calling only English strings orother English specific habits, and connected to generic ways of doingthe same, instead. Program developers may use various techniques tointernationalize their programs. Some of these have been standardized.GNU gettext offers one of these standards. See Programmers.

By localization, one means the operation by which, in a setof programs already internationalized, one gives the program allneeded information so that it can adapt itself to handle its inputand output in a fashion which is correct for some native language andcultural habits. This is a particularisation process, by which genericmethods already implemented in an internationalized program are usedin specific ways. The programming environment puts several functionsto the programmers disposal which allow this runtime configuration.The formal description of specific set of cultural habits for somecountry, together with all associated translations targeted to thesame native language, is called the locale for this languageor country. Users achieve localization of programs by setting propervalues to special environment variables, prior to executing thoseprograms, identifying which locale should be used.

In fact, locale message support is only one component of the culturaldata that makes up a particular locale. There are a whole host ofroutines and functions provided to aid programmers in developinginternationalized software and which allow them to access the datastored in a particular locale. When someone presently refers to aparticular locale, they are obviously referring to the data storedwithin that particular locale. Similarly, if a programmer is referringto “accessing the locale routines”, they are referring to thecomplete suite of routines that access all of the locale’s information.

One uses the expression Native Language Support, or merely NLS,for speaking of the overall activity or feature encompassing bothinternationalization and localization, allowing for multi-lingualinteractions in a program. In a nutshell, one could say thatinternationalization is the operation by which further localizationsare made possible.

Also, very roughly said, when it comes to multi-lingual messages,internationalization is usually taken care of by programmers, andlocalization is usually taken care of by translators.


Next: Files, Previous: Concepts, Up: Introduction   [Contents][Index]

1.3 Aspects in Native Language Support

For a totally multi-lingual distribution, there are many things totranslate beyond output messages.

  • As of today, GNU gettext offers a complete toolset fortranslating messages output by C programs. Perl scripts and shellscripts will also need to be translated. Even if there are today some hooksby which this can be done, these hooks are not integrated as well as theyshould be.
  • Some programs, like autoconf or bison, are ableto produce other programs (or scripts). Even if the generatingprograms themselves are internationalized, the generated programs theyproduce may need internationalization on their own, and this indirectinternationalization could be automated right from the generatingprogram. In fact, quite usually, generating and generated programscould be internationalized independently, as the effort needed isfairly orthogonal.
  • A few programs include textual tables which might need translationthemselves, independently of the strings contained in the programitself. For example, RFC 1345 gives an English description for eachcharacter which the recode program is able to reconstruct at execution.Since these descriptions are extracted from the RFC by mechanical means,translating them properly would require a prior translation of the RFCitself.
  • Almost all programs accept options, which are often worded out so tobe descriptive for the English readers; one might want to consideroffering translated versions for program options as well.
  • Many programs read, interpret, compile, or are somewhat driven byinput files which are texts containing keywords, identifiers, orreplies which are inherently translatable. For example, one may wantgcc to allow diacriticized characters in identifiers or usetranslated keywords; ‘rm -i’ might accept something else than‘y’ or ‘n’ for replies, etc. Even if the program willeventually make most of its output in the foreign languages, one hasto decide whether the input syntax, option values, etc., are to belocalized or not.
  • The manual accompanying a package, as well as all documentation filesin the distribution, could surely be translated, too. Translating amanual, with the intent of later keeping up with updates, is a majorundertaking in itself, generally.

As we already stressed, translation is only one aspect of locales.Other internationalization aspects are system services and are handledin GNU libc. Thereare many attributes that are needed to define a country’s culturalconventions. These attributes include beside the country’s nativelanguage, the formatting of the date and time, the representation ofnumbers, the symbols for currency, etc. These local rules aretermed the country’s locale. The locale represents the knowledgeneeded to support the country’s native attributes.

There are a few major areas which may vary between countries andhence, define what a locale must describe. The following list helpsputting multi-lingual messages into the proper context of other tasksrelated to locales. See the GNU libc manual for details.

Characters and Codesets

The codeset most commonly used through out the USA and most Englishspeaking parts of the world is the ASCII codeset. However, there aremany characters needed by various locales that are not found withinthis codeset. The 8-bit ISO 8859-1 code set has most of the specialcharacters needed to handle the major European languages. However, inmany cases, choosing ISO 8859-1 is nevertheless not adequate: itdoesn’t even handle the major European currency. Hence each localewill need to specify which codeset they need to use and will needto have the appropriate character handling routines to cope withthe codeset.

Currency

The symbols used vary from country to country as does the positionused by the symbol. Software needs to be able to transparentlydisplay currency figures in the native mode for each locale.

Dates

The format of date varies between locales. For example, Christmas dayin 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.Other countries might use ISO 8601 dates, etc.

Time of the day may be noted as hh:mm, hh.mm,or otherwise. Some locales require time to be specified in 24-hourmode rather than as AM or PM. Further, the nature and yearly extentof the Daylight Saving correction vary widely between countries.

Numbers

Numbers can be represented differently in different locales.For example, the following numbers are all written correctly fortheir respective locales:

12,345.67       English
12.345,67       German
 12345,67       French
1,2345.67       Asia

Some programs could go further and use different unit systems, likeEnglish units or Metric units, or even take into account variantsabout how numbers are spelled in full.

Messages

The most obvious area is the language support within a locale. This iswhere GNU gettext provides the means for developers and users toeasily change the language that the software uses to communicate tothe user.

These areas of cultural conventions are called locale categories.It is an unfortunate term; locale aspects or locale featurecategories would be a better term, because each “locale category”describes an area or task that requires localization. The concrete datathat describes the cultural conventions for such an area and for a particularculture is also called a locale category. In this sense, a localeis composed of several locale categories: the locale category describingthe codeset, the locale category describing the formatting of numbers,the locale category containing the translated messages, and so on.

Components of locale outside of message handling are standardized inthe ISO C standard and the POSIX:2001 standard (also known as the SUSV3specification). GNU libcfully implements this, and most other modern systems provide a moreor less reasonable support for at least some of the missing components.


Next: Overview, Previous: Aspects, Up: Introduction   [Contents][Index]

1.4 Files Conveying Translations

The letters PO in .po files means Portable Object, todistinguish it from .mo files, where MO stands for MachineObject. This paradigm, as well as the PO file format, is inspiredby the NLS standard developed by Uniforum, and first implemented bySun in their Solaris system.

PO files are meant to be read and edited by humans, and associate eachoriginal, translatable string of a given package with its translationin a particular target language. A single PO file is dedicated toa single target language. If a package supports many languages,there is one such PO file per language supported, and each packagehas its own set of PO files. These PO files are best created bythe xgettext program, and later updated or refreshed throughthe msgmerge program. Program xgettext extracts allmarked messages from a set of C files and initializes a PO file withempty translations. Program msgmerge takes care of adjustingPO files between releases of the corresponding sources, commentingobsolete entries, initializing new ones, and updating all sourceline references. Files ending with .pot are kind of basetranslation files found in distributions, in PO file format.

MO files are meant to be read by programs, and are binary in nature.A few systems already offer tools for creating and handling MO filesas part of the Native Language Support coming with the system, but theformat of these MO files is often different from system to system,and non-portable. The tools already provided with these systems don’tsupport all the features of GNU gettext. Therefore GNUgettext uses its own format for MO files. Files ending with.gmo are really MO files, when it is known that these files usethe GNU format.


Previous: Files, Up: Introduction   [Contents][Index]

1.5 Overview of GNU gettext

The following diagram summarizes the relation between the fileshandled by GNU gettext and the tools acting on these files.It is followed by somewhat detailed explanations, which you shouldread while keeping an eye on the diagram. Having a clear understandingof these interrelations will surely help programmers, translatorsand maintainers.

Original C Sources ───> Preparation ───> Marked C Sources ───╮
                                                             │
              ╭─────────<─── GNU gettext Library             │
╭─── make <───┤                                              │
│             ╰─────────<────────────────────┬───────────────╯
│                                            │
│   ╭─────<─── PACKAGE.pot <─── xgettext <───╯   ╭───<─── PO Compendium
│   │                                            │              ↑
│   │                                            ╰───╮          │
│   ╰───╮                                            ├───> PO editor ───╮
│       ├────> msgmerge ──────> LANG.po ────>────────╯                  │
│   ╭───╯                                                               │
│   │                                                                   │
│   ╰─────────────<───────────────╮                                     │
│                                 ├─── New LANG.po <────────────────────╯
│   ╭─── LANG.gmo <─── msgfmt <───╯
│   │
│   ╰───> install ───> /.../LANG/PACKAGE.mo ───╮
│                                              ├───> "Hello world!"
╰───────> install ───> /.../bin/PROGRAM ───────╯

As a programmer, the first step to bringing GNU gettextinto your package is identifying, right in the C sources, those stringswhich are meant to be translatable, and those which are untranslatable.This tedious job can be done a little more comfortably using emacs POmode, but you can use any means familiar to you for modifying yourC sources. Beside this some other simple, standard changes are needed toproperly initialize the translation library. See Sources, formore information about all this.

For newly written software the strings of course can and should bemarked while writing it. The gettext approach makes thisvery easy. Simply put the following lines at the beginning of each fileor in a central header file:

#define _(String) (String)
#define N_(String) String
#define textdomain(Domain)
#define bindtextdomain(Package, Directory)

Doing this allows you to prepare the sources for internationalization.Later when you feel ready for the step to use the gettext librarysimply replace these definitions by the following:

#include <libintl.h>
#define _(String) gettext (String)
#define gettext_noop(String) String
#define N_(String) gettext_noop (String)

and link against libintl.a or libintl.so. Note that onGNU systems, you don’t need to link with libintl because thegettext library functions are already contained in GNU libc.That is all you have to change.

Once the C sources have been modified, the xgettext programis used to find and extract all translatable strings, and create aPO template file out of all these. This package.pot filecontains all original program strings. It has sets of pointers toexactly where in C sources each string is used. All translationsare set to empty. The letter t in .pot marks this asa Template PO file, not yet oriented towards any particular language.See xgettext Invocation, for more details about how one calls thexgettext program. If you are really lazy, you mightbe interested at working a lot more right away, and preparing thewhole distribution setup (see Maintainers). By doing so, youspare yourself typing the xgettext command, as makeshould now generate the proper things automatically for you!

The first time through, there is no lang.po yet, so themsgmerge step may be skipped and replaced by a mere copy ofpackage.pot to lang.po, where langrepresents the target language. See Creating for details.

Then comes the initial translation of messages. Translation initself is a whole matter, still exclusively meant for humans,and whose complexity far overwhelms the level of this manual.Nevertheless, a few hints are given in some other chapter of thismanual (see Translators). You will also find there indicationsabout how to contact translating teams, or becoming part of them,for sharing your translating concerns with others who target the samenative language.

While adding the translated messages into the lang.poPO file, if you are not using one of the dedicated PO file editors(see Editing), you are on your ownfor ensuring that your efforts fully respect the PO file format, and quotingconventions (see PO Files). This is surely not an impossible task,as this is the way many people have handled PO files around 1995.On the other hand, by using a PO file editor, most detailsof PO file format are taken care of for you, but you have to acquiresome familiarity with PO file editor itself.

If some common translations have already been saved into a compendiumPO file, translators may use PO mode for initializing untranslatedentries from the compendium, and also save selected translations intothe compendium, updating it (see Compendium). Compendium filesare meant to be exchanged between members of a given translation team.

Programs, or packages of programs, are dynamic in nature: users writebug reports and suggestion for improvements, maintainers react bymodifying programs in various ways. The fact that a package hasalready been internationalized should not make maintainers shyof adding new strings, or modifying strings already translated.They just do their job the best they can. For the TranslationProject to work smoothly, it is important that maintainers do notcarry translation concerns on their already loaded shoulders, and thattranslators be kept as free as possible of programming concerns.

The only concern maintainers should have is carefully marking newstrings as translatable, when they should be, and do not otherwiseworry about them being translated, as this will come in proper time.Consequently, when programs and their strings are adjusted in variousways by maintainers, and for matters usually unrelated to translation,xgettext would construct package.pot files which areevolving over time, so the translations carried by lang.poare slowly fading out of date.

It is important for translators (and even maintainers) to understandthat package translation is a continuous process in the lifetime of apackage, and not something which is done once and for all at the start.After an initial burst of translation activity for a given package,interventions are needed once in a while, because here and there,translated entries become obsolete, and new untranslated entriesappear, needing translation.

The msgmerge program has the purpose of refreshing an alreadyexisting lang.po file, by comparing it with a newerpackage.pot template file, extracted by xgettextout of recent C sources. The refreshing operation adjusts allreferences to C source locations for strings, since these stringsmove as programs are modified. Also, msgmerge comments out asobsolete, in lang.po, those already translated entrieswhich are no longer used in the program sources (see Obsolete Entries). It finally discovers new strings and inserts them inthe resulting PO file as untranslated entries (see Untranslated Entries). See msgmerge Invocation, for more information about whatmsgmerge really does.

Whatever route or means taken, the goal is to obtain an updatedlang.po file offering translations for all strings.

The temporal mobility, or fluidity of PO files, is an integral part ofthe translation game, and should be well understood, and accepted.People resisting it will have a hard time participating in theTranslation Project, or will give a hard time to other participants! Inparticular, maintainers should relax and include all available officialPO files in their distributions, even if these have not recently beenupdated, without exerting pressure on the translator teams to get thejob done. The pressure should rather comefrom the community of users speaking a particular language, andmaintainers should consider themselves fairly relieved of any concernabout the adequacy of translation files. On the other hand, translatorsshould reasonably try updating the PO files they are responsible for,while the package is undergoing pretest, prior to an officialdistribution.

Once the PO file is complete and dependable, the msgfmt programis used for turning the PO file into a machine-oriented format, whichmay yield efficient retrieval of translations by the programs of thepackage, whenever needed at runtime (see MO Files). See msgfmt Invocation, for more information about all modes of executionfor the msgfmt program.

Finally, the modified and marked C sources are compiled and linkedwith the GNU gettext library, usually through the operation ofmake, given a suitable Makefile exists for the project,and the resulting executable is installed somewhere users will find it.The MO files themselves should also be properly installed. Given theappropriate environment variables are set (see Setting the POSIX Locale),the program should localize itself automatically, whenever it executes.

The remainder of this manual has the purpose of explaining in depth the varioussteps outlined above.


Next: PO Files, Previous: Introduction, Up: Top   [Contents][Index]

2 The User’s View

Nowadays, when users log into a computer, they usually find that alltheir programs show messages in their native language – at least forusers of languages with an active free software community, like French orGerman; to a lesser extent for languages with a smaller participation infree software and the GNU project, like Hindi and Filipino.

How does this work? How can the user influence the language that is usedby the programs? This chapter will answer it.


2.1 Operating System Installation

The default language is often already specified during operating systeminstallation. When the operating system is installed, the installertypically asks for the language used for the installation process and,separately, for the language to use in the installed system. Some OSinstallers only ask for the language once.

This determines the system-wide default language for all users. But theinstallers often give the possibility to install extra localizations foradditional languages. For example, the localizations of KDE (the KDesktop Environment) and OpenOffice.org are often bundled separately,as one installable package per language.

At this point it is good to consider the intended use of the machine: Ifit is a machine designated for personal use, additional localizations areprobably not necessary. If, however, the machine is in use in anorganization or company that has international relationships, one canconsider the needs of guest users. If you have a guest from abroad, fora week, what could be his preferred locales? It may be worth installingthese additional localizations ahead of time, since they cost only a bitof disk space at this point.

The system-wide default language is the locale configuration that is usedwhen a new user account is created. But the user can have his own localeconfiguration that is different from the one of the other users of thesame machine. He can specify it, typically after the first login, asdescribed in the next section.


2.2 Setting the Locale Used by GUI Programs

The immediately available programs in a user’s desktop come from a groupof programs called a “desktop environment”; it usually includes the windowmanager, a web browser, a text editor, and more. The most common freedesktop environments are KDE, GNOME, and Xfce.

The locale used by GUI programs of the desktop environment can be specifiedin a configuration screen called “control center”, “language settings”or “country settings”.

Individual GUI programs that are not part of the desktop environment canhave their locale specified either in a settings panel, or through environmentvariables.

For some programs, it is possible to specify the locale through environmentvariables, possibly even to a different locale than the desktop’s locale.This means, instead of starting a program through a menu or from the filesystem, you can start it from the command-line, after having set someenvironment variables. The environment variables can be those specifiedin the next section (Setting the POSIX Locale); for some versions ofKDE, however, the locale is specified through a variable KDE_LANG,rather than LANG or LC_ALL.


2.3 Setting the Locale through Environment Variables

As a user, if your language has been installed for this package, in thesimplest case, you only have to set the LANG environment variableto the appropriate ‘ll_CC’ combination. For example,let’s suppose that you speak German and live in Germany. At the shellprompt, merely execute ‘setenv LANG de_DE’ (in csh),‘export LANG; LANG=de_DE’ (in sh) or‘export LANG=de_DE’ (in bash). This can be done from your.login or .profile file, once and for all.


2.3.1 Locale Names

A locale name usually has the form ‘ll_CC’. Here‘ll’ is an ISO 639 two-letter language code, and‘CC’ is an ISO 3166 two-letter country code. For example,for German in Germany, ll is de, and CC is DE.You find a list of the language codes in appendix Language Codes anda list of the country codes in appendix Country Codes.

You might think that the country code specification is redundant. But infact, some languages have dialects in different countries. For example,‘de_AT’ is used for Austria, and ‘pt_BR’ for Brazil. The countrycode serves to distinguish the dialects.

Many locale names have an extended syntax‘ll_CC.encoding’ that also specifies the characterencoding. These are in use because between 2000 and 2005, most users haveswitched to locales in UTF-8 encoding. For example, the German locale onglibc systems is nowadays ‘de_DE.UTF-8’. The older name ‘de_DE’still refers to the German locale as of 2000 that stores characters inISO-8859-1 encoding – a text encoding that cannot even accommodate the Eurocurrency sign.

Some locale names use ‘ll_CC.@variant’ instead of‘ll_CC’. The ‘@variant’ can denote any kind ofcharacteristics that is not already implied by the language ll andthe country CC. It can denote a particular monetary unit. For example,on glibc systems, ‘de_DE@euro’ denotes the locale that uses the Eurocurrency, in contrast to the older locale ‘de_DE’ which implies the useof the currency before 2002. It can also denote a dialect of the language,or the script used to write text (for example, ‘sr_RS@latin’ uses theLatin script, whereas ‘sr_RS’ uses the Cyrillic script to write Serbian),or the orthography rules, or similar.

On other systems, some variations of this scheme are used, such as‘ll’. You can get the list of locales supported by your systemfor your language by running the command ‘locale -a | grep '^ll'’.

There is also a special locale, called ‘C’.When it is used, it disables all localization: in this locale, all programsstandardized by POSIX use English messages and an unspecified characterencoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending onthe operating system).


2.3.2 Locale Environment Variables

A locale is composed of several locale categories, see Aspects.When a program looks up locale dependent values, it does this according tothe following environment variables, in priority order:

  1. LANGUAGE
  2. LC_ALL
  3. LC_xxx, according to selected locale category:LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE,LC_MONETARY, LC_MESSAGES, ...
  4. LANG

Variables whose value is set but is empty are ignored in this lookup.

LANG is the normal environment variable for specifying a locale.As a user, you normally set this variable (unless some of the other variableshave already been set by the system, in /etc/profile or similarinitialization files).

LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE,LC_MONETARY, LC_MESSAGES, and so on, are the environmentvariables meant to override LANG and affecting a single localecategory only. For example, assume you are a Swedish user in Spain, and youwant your programs to handle numbers and dates according to Spanishconventions, and only the messages should be in Swedish. Then you couldcreate a locale named ‘sv_ES’ or ‘sv_ES.UTF-8’ by use of thelocaledef program. But it is simpler, and achieves the same effect,to set the LANG variable to es_ES.UTF-8 and theLC_MESSAGES variable to sv_SE.UTF-8; these two locales comealready preinstalled with the operating system.

LC_ALL is an environment variable that overrides all of these.It is typically used in scripts that run particular programs. For example,configure scripts generated by GNU autoconf use LC_ALL to makesure that the configuration tests don’t operate in locale dependent ways.

Some systems, unfortunately, set LC_ALL in /etc/profile or insimilar initialization files. As a user, you therefore have to unset thisvariable if you want to set LANG and optionally some of the otherLC_xxx variables.

The LANGUAGE variable is described in the next subsection.


2.3.3 Specifying a Priority List of Languages

Not all programs have translations for all languages. By default, anEnglish message is shown in place of a nonexistent translation. If youunderstand other languages, you can set up a priority list of languages.This is done through a different environment variable, calledLANGUAGE. GNU gettext gives preference to LANGUAGEover LC_ALL and LANG for the purpose of message handling,but you still need to have LANG (or LC_ALL) set to the primarylanguage; this is required by other parts of the system libraries.For example, some Swedish users who would rather read translations inGerman than English for when Swedish is not available, set LANGUAGEto ‘sv:de’ while leaving LANG to ‘sv_SE’.

Special advice for Norwegian users: The language code for Norwegianbokmål changed from ‘no’ to ‘nb’ recently (in 2003).During the transition period, while some message catalogs for this languageare installed under ‘nb’ and some older ones under ‘no’, it isrecommended for Norwegian users to set LANGUAGE to ‘nb:no’ so thatboth newer and older translations are used.

In the LANGUAGE environment variable, but not in the otherenvironment variables, ‘ll_CC’ combinations can beabbreviated as ‘ll’ to denote the language’s main dialect.For example, ‘de’ is equivalent to ‘de_DE’ (German as spoken inGermany), and ‘pt’ to ‘pt_PT’ (Portuguese as spoken in Portugal)in this context.

Note: The variable LANGUAGE is ignored if the locale is set to‘C’. In other words, you have to first enable localization, by settingLANG (or LC_ALL) to a value other than ‘C’, before you canuse a language priority list through the LANGUAGE variable.


2.4 Installing Translations for Particular Programs

Languages are not equally well supported in all packages using GNUgettext, and more translations are added over time. Usually, youuse the translations that are shipped with the operating systemor with particular packages that you install afterwards. But you can alsoinstall newer localizations directly. For doing this, you will need anunderstanding where each localization file is stored on the file system.

For programs that participate in the Translation Project, you can startlooking for translations here:http://translationproject.org/team/index.html.A snapshot of this information is also found in the ABOUT-NLS filethat is shipped with GNU gettext.

For programs that are part of the KDE project, the starting point is:http://i18n.kde.org/.

For programs that are part of the GNOME project, the starting point is:http://www.gnome.org/i18n/.

For other programs, you may check whether the program’s source code packagecontains some ll.po files; often they are kept together in adirectory called po/. Each ll.po file contains themessage translations for the language whose abbreviation of ll.


Next: Sources, Previous: Users, Up: Top   [Contents][Index]

3 The Format of PO Files

The GNU gettext toolset helps programmers and translatorsat producing, updating and using translation files, mainly thosePO files which are textual, editable files. This chapter explainsthe format of PO files.

A PO file is made up of many entries, each entry holding the relationbetween an original untranslated string and its correspondingtranslation. All entries in a given PO file usually pertainto a single project, and all translations are expressed in a singletarget language. One PO file entry has the following schematicstructure:

white-space
#  translator-comments
#. extracted-comments
#: reference…
#, flag…
#| msgid previous-untranslated-string
msgid untranslated-string
msgstr translated-string

The general structure of a PO file should be well understood bythe translator. When using PO mode, very little has to be knownabout the format details, as PO mode takes care of them for her.

A simple entry can look like this:

#: lib/error.c:116
msgid "Unknown system error"
msgstr "Error desconegut del sistema"

Entries begin with some optional white space. Usually, when generatedthrough GNU gettext tools, there is exactly one blank linebetween entries. Then comments follow, on lines all starting with thecharacter #. There are two kinds of comments: those which havesome white space immediately following the # - the translatorcomments -, which comments are created and maintained exclusively by thetranslator, and those which have some non-white character just after the# - the automatic comments -, which comments are created andmaintained automatically by GNU gettext tools. Comment linesstarting with #. contain comments given by the programmer, directedat the translator; these comments are called extracted commentsbecause the xgettext program extracts them from the program’ssource code. Comment lines starting with #: contain references tothe program’s source code. Comment lines starting with #, containflags; more about these below. Comment lines starting with #|contain the previous untranslated string for which the translator gavea translation.

All comments, of either kind, are optional.

After white space and comments, entries show two strings, namelyfirst the untranslated string as it appears in the original programsources, and then, the translation of this string. The originalstring is introduced by the keyword msgid, and the translation,by msgstr. The two strings, untranslated and translated,are quoted in various ways in the PO file, using "delimiters and \ escapes, but the translator does not reallyhave to pay attention to the precise quoting format, as PO mode fullytakes care of quoting for her.

The msgid strings, as well as automatic comments, are producedand managed by other GNU gettext tools, and PO mode does notprovide means for the translator to alter these. The most she cando is merely deleting them, and only by deleting the whole entry.On the other hand, the msgstr string, as well as translatorcomments, are really meant for the translator, and PO mode gives herthe full control she needs.

The comment lines beginning with #, are special because they arenot completely ignored by the programs as comments generally are. Thecomma separated list of flags is used by the msgfmtprogram to give the user some better diagnostic messages. Currentlythere are two forms of flags defined:

fuzzy

This flag can be generated by the msgmerge program or it can beinserted by the translator herself. It shows that the msgstrstring might not be a correct translation (anymore). Only the translatorcan judge if the translation requires further modification, or isacceptable as is. Once satisfied with the translation, she then removesthis fuzzy attribute. The msgmerge program inserts thiswhen it combined the msgid and msgstr entries after fuzzysearch only. See Fuzzy Entries.

c-format
no-c-format

These flags should not be added by a human. Instead only thexgettext program adds them. In an automated PO file processingsystem as proposed here, the user’s changes would be thrown away again assoon as the xgettext program generates a new template file.

The c-format flag indicates that the untranslated string and thetranslation are supposed to be C format strings. The no-c-formatflag indicates that they are not C format strings, even though the untranslatedstring happens to look like a C format string (with ‘%’ directives).

When the c-format flag is given for a string the msgfmtprogram does some more tests to check the validity of the translation.See msgfmt Invocation, c-format Flag and c-format.

objc-format
no-objc-format

Likewise for Objective C, see objc-format.

sh-format
no-sh-format

Likewise for Shell, see sh-format.

python-format
no-python-format

Likewise for Python, see python-format.

python-brace-format
no-python-brace-format

Likewise for Python brace, see python-format.

lisp-format
no-lisp-format

Likewise for Lisp, see lisp-format.

elisp-format
no-elisp-format

Likewise for Emacs Lisp, see elisp-format.

librep-format
no-librep-format

Likewise for librep, see librep-format.

scheme-format
no-scheme-format

Likewise for Scheme, see scheme-format.

smalltalk-format
no-smalltalk-format

Likewise for Smalltalk, see smalltalk-format.

java-format
no-java-format

Likewise for Java, see java-format.

csharp-format
no-csharp-format

Likewise for C#, see csharp-format.

awk-format
no-awk-format

Likewise for awk, see awk-format.

object-pascal-format
no-object-pascal-format

Likewise for Object Pascal, see object-pascal-format.

ycp-format
no-ycp-format

Likewise for YCP, see ycp-format.

tcl-format
no-tcl-format

Likewise for Tcl, see tcl-format.

perl-format
no-perl-format

Likewise for Perl, see perl-format.

perl-brace-format
no-perl-brace-format

Likewise for Perl brace, see perl-format.

php-format
no-php-format

Likewise for PHP, see php-format.

gcc-internal-format
no-gcc-internal-format

Likewise for the GCC sources, see gcc-internal-format.

gfc-internal-format
no-gfc-internal-format

Likewise for the GNU Fortran Compiler sources, see gfc-internal-format.

qt-format
no-qt-format

Likewise for Qt, see qt-format.

qt-plural-format
no-qt-plural-format

Likewise for Qt plural forms, see qt-plural-format.

kde-format
no-kde-format

Likewise for KDE, see kde-format.

boost-format
no-boost-format

Likewise for Boost, see boost-format.

lua-format
no-lua-format

Likewise for Lua, see lua-format.

javascript-format
no-javascript-format

Likewise for JavaScript, see javascript-format.

It is also possible to have entries with a context specifier. They look likethis:

white-space
#  translator-comments
#. extracted-comments
#: reference…
#, flag…
#| msgctxt previous-context
#| msgid previous-untranslated-string
msgctxt context
msgid untranslated-string
msgstr translated-string

The context serves to disambiguate messages with the sameuntranslated-string. It is possible to have several entries withthe same untranslated-string in a PO file, provided that they eachhave a different context. Note that an empty context stringand an absent msgctxt line do not mean the same thing.

A different kind of entries is used for translations which involveplural forms.

white-space
#  translator-comments
#. extracted-comments
#: reference…
#, flag…
#| msgid previous-untranslated-string-singular
#| msgid_plural previous-untranslated-string-plural
msgid untranslated-string-singular
msgid_plural untranslated-string-plural
msgstr[0] translated-string-case-0
...
msgstr[N] translated-string-case-n

Such an entry can look like this:

#: src/msgcmp.c:338 src/po-lex.c:699
#, c-format
msgid "found %d fatal error"
msgid_plural "found %d fatal errors"
msgstr[0] "s'ha trobat %d error fatal"
msgstr[1] "s'han trobat %d errors fatals"

Here also, a msgctxt context can be specified before msgid,like above.

Here, additional kinds of flags can be used:

range:

This flag is followed by a range of non-negative numbers, using the syntaxrange: minimum-value..maximum-value. It designates thepossible values that the numeric parameter of the message can take. In somelanguages, translators may produce slightly better translations if they knowthat the value can only take on values between 0 and 10, for example.

The previous-untranslated-string is optionally inserted by themsgmerge program, at the same time when it marks a message fuzzy.It helps the translator to see which changes were done by the developerson the untranslated-string.

It happens that some lines, usually whitespace or comments, follow thevery last entry of a PO file. Such lines are not part of any entry,and will be dropped when the PO file is processed by the tools, or maydisturb some PO file editors.

The remainder of this section may be safely skipped by those usinga PO file editor, yet it may be interesting for everybody to have a betteridea of the precise format of a PO file. On the other hand, thosewishing to modify PO files by hand should carefully continue reading on.

An empty untranslated-string is reserved to contain the headerentry with the meta information (see Header Entry). This headerentry should be the first entry of the file. The emptyuntranslated-string is reserved for this purpose and mustnot be used anywhere else.

Each of untranslated-string and translated-string respectsthe C syntax for a character string, including the surrounding quotesand embedded backslashed escape sequences. When the time comesto write multi-line strings, one should not use escaped newlines.Instead, a closing quote should follow the last character on theline to be continued, and an opening quote should resume the stringat the beginning of the following PO file line. For example:

msgid ""
"Here is an example of how one might continue a very long string\n"
"for the common case the string represents multi-line output.\n"

In this example, the empty string is used on the first line, toallow better alignment of the H from the word ‘Here’over the f from the word ‘for’. In this example, themsgid keyword is followed by three strings, which are meantto be concatenated. Concatenating the empty string does not changethe resulting overall string, but it is a way for us to comply withthe necessity of msgid to be followed by a string on the sameline, while keeping the multi-line presentation left-justified, aswe find this to be a cleaner disposition. The empty string could havebeen omitted, but only if the string starting with ‘Here’ waspromoted on the first line, right after msgid.2 It was not really necessaryeither to switch between the two last quoted strings immediately afterthe newline ‘\n’, the switch could have occurred after anyother character, we just did it this way because it is neater.

One should carefully distinguish between end of lines marked as‘\ninside quotes, which are part of the representedstring, and end of lines in the PO file itself, outside string quotes,which have no incidence on the represented string.

Outside strings, white lines and comments may be used freely.Comments start at the beginning of a line with ‘#’ and extenduntil the end of the PO file line. Comments written by translatorsshould have the initial ‘#’ immediately followed by some whitespace. If the ‘#’ is not immediately followed by white space,this comment is most likely generated and managed by specialized GNUtools, and might disappear or be replaced unexpectedly when the POfile is given to msgmerge.


Next: Template, Previous: PO Files, Up: Top   [Contents][Index]

4 Preparing Program Sources

For the programmer, changes to the C source code fall into threecategories. First, you have to make the localization functionsknown to all modules needing message translation. Second, you shouldproperly trigger the operation of GNU gettext when the programinitializes, usually from the main function. Last, you shouldidentify, adjust and mark all constant strings in your programneeding translation.


Next: Triggering, Previous: Sources, Up: Sources   [Contents][Index]

4.1 Importing the gettext declaration

Presuming that your set of programs, or package, has been adjustedso all needed GNU gettext files are available, and yourMakefile files are adjusted (see Maintainers), each C modulehaving translated C strings should contain the line:

#include <libintl.h>

Similarly, each C module containing printf()/fprintf()/...calls with a format string that could be a translated C string (even ifthe C string comes from a different C module) should contain the line:

#include <libintl.h>

4.2 Triggering gettext Operations

The initialization of locale data should be done with more or lessthe same code in every program, as demonstrated below:

int
main (int argc, char *argv[])
{
  …
  setlocale (LC_ALL, "");
  bindtextdomain (PACKAGE, LOCALEDIR);
  textdomain (PACKAGE);
  …
}

PACKAGE and LOCALEDIR should be provided either byconfig.h or by the Makefile. For now consult the gettextor hello sources for more information.

The use of LC_ALL might not be appropriate for you.LC_ALL includes all locale categories and especiallyLC_CTYPE. This latter category is responsible for determiningcharacter classes with the isalnum etc. functions fromctype.h which could especially for programs, which process somekind of input language, be wrong. For example this would mean that asource code using the ç (c-cedilla character) is runnable inFrance but not in the U.S.

Some systems also have problems with parsing numbers using thescanf functions if an other but the LC_ALL locale category isused. The standards say that additional formats but the one known in the"C" locale might be recognized. But some systems seem to rejectnumbers in the "C" locale format. In some situation, it mightalso be a problem with the notation itself which makes it impossible torecognize whether the number is in the "C" locale or the localformat. This can happen if thousands separator characters are used.Some locales define this character according to the nationalconventions to '.' which is the same character used in the"C" locale to denote the decimal point.

So it is sometimes necessary to replace the LC_ALL line in thecode above by a sequence of setlocale lines

{
  …
  setlocale (LC_CTYPE, "");
  setlocale (LC_MESSAGES, "");
  …
}

On all POSIX conformant systems the locale categories LC_CTYPE,LC_MESSAGES, LC_COLLATE, LC_MONETARY,LC_NUMERIC, and LC_TIME are available. On some systemswhich are only ISO C compliant, LC_MESSAGES is missing, buta substitute for it is defined in GNU gettext’s <libintl.h> andin GNU gnulib’s <locale.h>.

Note that changing the LC_CTYPE also affects the functionsdeclared in the <ctype.h> standard header and some functionsdeclared in the <string.h> and <stdlib.h> standard headers.If this is notdesirable in your application (for example in a compiler’s parser),you can use a set of substitute functions which hardwire the C locale,such as found in the modules ‘c-ctype’, ‘c-strcase’,‘c-strcasestr’, ‘c-strtod’, ‘c-strtold’ in the GNU gnulibsource distribution.

It is also possible to switch the locale forth and back between theenvironment dependent locale and the C locale, but this approach isnormally avoided because a setlocale call is expensive,because it is tedious to determine the places where a locale switchis needed in a large program’s source, and because switching a localeis not multithread-safe.


Next: Mark Keywords, Previous: Triggering, Up: Sources   [Contents][Index]

4.3 Preparing Translatable Strings

Before strings can be marked for translations, they sometimes need tobe adjusted. Usually preparing a string for translation is done rightbefore marking it, during the marking phase which is described in thenext sections. What you have to keep in mind while doing that is thefollowing.

  • Decent English style.
  • Entire sentences.
  • Split at paragraphs.
  • Use format strings instead of string concatenation.
  • Avoid unusual markup and unusual control characters.

Let’s look at some examples of these guidelines.

Translatable strings should be in good English style. If slang languagewith abbreviations and shortcuts is used, often translators will notunderstand the message and will produce very inappropriate translations.

"%s: is parameter\n"

This is nearly untranslatable: Is the displayed item a parameter orthe parameter?

"No match"

The ambiguity in this message makes it unintelligible: Is the programattempting to set something on fire? Does it mean "The given object doesnot match the template"? Does it mean "The template does not fit for anyof the objects"?

In both cases, adding more words to the message will help both thetranslator and the English speaking user.

Translatable strings should be entire sentences. It is often not possibleto translate single verbs or adjectives in a substitutable way.

printf ("File %s is %s protected", filename, rw ? "write" : "read");

Most translators will not look at the source and will thus only see thestring "File %s is %s protected", which is unintelligible. Changethis to

printf (rw ? "File %s is write protected" : "File %s is read protected",
        filename);

This way the translator will not only understand the message, she willalso be able to find the appropriate grammatical construction. A Frenchtranslator for example translates "write protected" like "protectedagainst writing".

Entire sentences are also important because in many languages, thedeclination of some word in a sentence depends on the gender or thenumber (singular/plural) of another part of the sentence. There areusually more interdependencies between words than in English. Theconsequence is that asking a translator to translate two half-sentencesand then combining these two half-sentences through dumb string concatenationwill not work, for many languages, even though it would work for English.That’s why translators need to handle entire sentences.

Often sentences don’t fit into a single line. If a sentence is outputusing two subsequent printf statements, like this

printf ("Locale charset \"%s\" is different from\n", lcharset);
printf ("input file charset \"%s\".\n", fcharset);

the translator would have to translate two half sentences, but nothingin the POT file would tell her that the two half sentences belong together.It is necessary to merge the two printf statements so that thetranslator can handle the entire sentence at once and decide at whichplace to insert a line break in the translation (if at all):

printf ("Locale charset \"%s\" is different from\n\
input file charset \"%s\".\n", lcharset, fcharset);

You may now ask: how about two or more adjacent sentences? Like in this case:

puts ("Apollo 13 scenario: Stack overflow handling failed.");
puts ("On the next stack overflow we will crash!!!");

Should these two statements merged into a single one? I would recommend tomerge them if the two sentences are related to each other, because then itmakes it easier for the translator to understand and translate both. Onthe other hand, if one of the two messages is a stereotypic one, occurringin other places as well, you will do a favour to the translator by notmerging the two. (Identical messages occurring in several places arecombined by xgettext, so the translator has to handle them once only.)

Translatable strings should be limited to one paragraph; don’t let asingle message be longer than ten lines. The reason is that when thetranslatable string changes, the translator is faced with the task ofupdating the entire translated string. Maybe only a single word willhave changed in the English string, but the translator doesn’t see that(with the current translation tools), therefore she has to proofreadthe entire message.

Many GNU programs have a ‘--help’ output that extends over severalscreen pages. It is a courtesy towards the translators to split such amessage into several ones of five to ten lines each. While doing that,you can also attempt to split the documented options into groups,such as the input options, the output options, and the informativeoutput options. This will help every user to find the option he islooking for.

Hardcoded string concatenation is sometimes used to construct Englishstrings:

strcpy (s, "Replace ");
strcat (s, object1);
strcat (s, " with ");
strcat (s, object2);
strcat (s, "?");

In order to present to the translator only entire sentences, and alsobecause in some languages the translator might want to swap the orderof object1 and object2, it is necessary to change thisto use a format string:

sprintf (s, "Replace %s with %s?", object1, object2);

A similar case is compile time concatenation of strings. The ISO C 99include file <inttypes.h> contains a macro PRId64 thatcan be used as a formatting directive for outputting an ‘int64_t’integer through printf. It expands to a constant string, usually"d" or "ld" or "lld" or something like this, depending on the platform.Assume you have code like

printf ("The amount is %0" PRId64 "\n", number);

The gettext tools and library have special support for these<inttypes.h> macros. You can therefore simply write

printf (gettext ("The amount is %0" PRId64 "\n"), number);

The PO file will contain the string "The amount is %0<PRId64>\n".The translators will provide a translation containing "%0<PRId64>"as well, and at runtime the gettext function’s result willcontain the appropriate constant string, "d" or "ld" or "lld".

This works only for the predefined <inttypes.h> macros. Ifyou have defined your own similar macros, let’s say ‘MYPRId64’,that are not known to xgettext, the solution for this problemis to change the code like this:

char buf1[100];
sprintf (buf1, "%0" MYPRId64, number);
printf (gettext ("The amount is %s\n"), buf1);

This means, you put the platform dependent code in one statement, and theinternationalization code in a different statement. Note that a buffer lengthof 100 is safe, because all available hardware integer types are limited to128 bits, and to print a 128 bit integer one needs at most 54 characters,regardless whether in decimal, octal or hexadecimal.

All this applies to other programming languages as well. For example, inJava and C#, string concatenation is very frequently used, because it is acompiler built-in operator. Like in C, in Java, you would change

System.out.println("Replace "+object1+" with "+object2+"?");

into a statement involving a format string:

System.out.println(
    MessageFormat.format("Replace {0} with {1}?",
                         new Object[] { object1, object2 }));

Similarly, in C#, you would change

Console.WriteLine("Replace "+object1+" with "+object2+"?");

into a statement involving a format string:

Console.WriteLine(
    String.Format("Replace {0} with {1}?", object1, object2));

Unusual markup or control characters should not be used in translatablestrings. Translators will likely not understand the particular meaningof the markup or control characters.

For example, if you have a convention that ‘|’ delimits theleft-hand and right-hand part of some GUI elements, translators willoften not understand it without specific comments. It might bebetter to have the translator translate the left-hand and right-handpart separately.

Another example is the ‘argp’ convention to use a single ‘\v’(vertical tab) control character to delimit two sections inside astring. This is flawed. Some translators may convert it to a simplenewline, some to blank lines. With some PO file editors it may not beeasy to even enter a vertical tab control character. So, you cannotbe sure that the translation will contain a ‘\v’ character, at thecorresponding position. The solution is, again, to let the translatortranslate two separate strings and combine at run-time the two translatedstrings with the ‘\v’ required by the convention.

HTML markup, however, is common enough that it’s probably ok to use intranslatable strings. But please bear in mind that the GNU gettext toolsdon’t verify that the translations are well-formed HTML.


4.4 How Marks Appear in Sources

All strings requiring translation should be marked in the C sources. Markingis done in such a way that each translatable string appears to bethe sole argument of some function or preprocessor macro. There areonly a few such possible functions or macros meant for translation,and their names are said to be marking keywords. The marking isattached to strings themselves, rather than to what we do with them.This approach has more uses. A blatant example is an error messageproduced by formatting. The format string needs translation, aswell as some strings inserted through some ‘%s’ specificationin the format, while the result from sprintf may have so manydifferent instances that it is impractical to list them all in some‘error_string_out()’ routine, say.

This marking operation has two goals. The first goal of markingis for triggering the retrieval of the translation, at run time.The keyword is possibly resolved into a routine able to dynamicallyreturn the proper translation, as far as possible or wanted, for theargument string. Most localizable strings are found in executablepositions, that is, attached to variables or given as parameters tofunctions. But this is not universal usage, and some translatablestrings appear in structured initializations. See Special cases.

The second goal of the marking operation is to help xgettextat properly extracting all translatable strings when it scans a setof program sources and produces PO file templates.

The canonical keyword for marking translatable strings is‘gettext’, it gave its name to the whole GNU gettextpackage. For packages making only light use of the ‘gettext’keyword, macro or function, it is easily used as is. However,for packages using the gettext interface more heavily, itis usually more convenient to give the main keyword a shorter, lessobtrusive name. Indeed, the keyword might appear on a lot of stringsall over the package, and programmers usually do not want nor needtheir program sources to remind them forcefully, all the time, that theyare internationalized. Further, a long keyword has the disadvantageof using more horizontal space, forcing more indentation work onsources for those trying to keep them within 79 or 80 columns.

Many packages use ‘_’ (a simple underline) as a keyword,and write ‘_("Translatable string")’ instead of ‘gettext("Translatable string")’. Further, the coding rule, from GNU standards,wanting that there is a space between the keyword and the openingparenthesis is relaxed, in practice, for this particular usage.So, the textual overhead per translatable string is reduced toonly three characters: the underline and the two parentheses.However, even if GNU gettext uses this convention internally,it does not offer it officially. The real, genuine keyword is truly‘gettext’ indeed. It is fairly easy for those wanting to use‘_’ instead of ‘gettext’ to declare:

#include <libintl.h>
#define _(String) gettext (String)

instead of merely using ‘#include <libintl.h>’.

The marking keywords ‘gettext’ and ‘_’ take the translatablestring as sole argument. It is also possible to define marking functionsthat take it at another argument position. It is even possible to makethe marked argument position depend on the total number of arguments ofthe function call; this is useful in C++. All this is achieved usingxgettext’s ‘--keyword’ option. How to pass such an optionto xgettext, assuming that gettextize is used, is describedin po/Makevars and AM_XGETTEXT_OPTION.

Note also that long strings can be split across lines, into multipleadjacent string tokens. Automatic string concatenation is performedat compile time according to ISO C and ISO C++; xgettext alsosupports this syntax.

Later on, the maintenance is relatively easy. If, as a programmer,you add or modify a string, you will have to ask yourself if thenew or altered string requires translation, and include it within‘_()’ if you think it should be translated. For example, ‘"%s"’is an example of string not requiring translation. But‘"%s: %d"does require translation, because in French, unlikein English, it’s customary to put a space before a colon.


4.5 Marking Translatable Strings

In PO mode, one set of features is meant more for the programmer thanfor the translator, and allows him to interactively mark which strings,in a set of program sources, are translatable, and which are not.Even if it is a fairly easy job for a programmer to find and marksuch strings by other means, using any editor of his choice, PO modemakes this work more comfortable. Further, this gives translatorswho feel a little like programmers, or programmers who feel a littlelike translators, a tool letting them work at marking translatablestrings in the program sources, while simultaneously producing a set oftranslation in some language, for the package being internationalized.

The set of program sources, targeted by the PO mode commands describehere, should have an Emacs tags table constructed for your project,prior to using these PO file commands. This is easy to do. In anyshell window, change the directory to the root of your project, thenexecute a command resembling:

etags src/*.[hc] lib/*.[hc]

presuming here you want to process all .h and .c filesfrom the src/ and lib/ directories. This command willexplore all said files and create a TAGS file in your rootdirectory, somewhat summarizing the contents using a special fileformat Emacs can understand.

For packages following the GNU coding standards, there isa make goal tags or TAGS which constructs the tag files inall directories and for all files containing source code.

Once your TAGS file is ready, the following commands assistthe programmer at marking translatable strings in his set of sources.But these commands are necessarily driven from within a PO filewindow, and it is likely that you do not even have such a PO file yet.This is not a problem at all, as you may safely open a new, empty POfile, mainly for using these commands. This empty PO file will slowlyfill in while you mark strings as translatable in your program sources.

,

Search through program sources for a string which looks like acandidate for translation (po-tags-search).

M-,

Mark the last string found with ‘_()’ (po-mark-translatable).

M-.

Mark the last string found with a keyword taken from a set of possiblekeywords. This command with a prefix allows some management of thesekeywords (po-select-mark-and-mark).

The , (po-tags-search) command searches for the nextoccurrence of a string which looks like a possible candidate fortranslation, and displays the program source in another Emacs window,positioned in such a way that the string is near the top of this otherwindow. If the string is too big to fit whole in this window, it ispositioned so only its end is shown. In any case, the cursoris left in the PO file window. If the shown string would be betterpresented differently in different native languages, you may mark itusing M-, or M-.. Otherwise, you might rather ignore itand skip to the next string by merely repeating the , command.

A string is a good candidate for translation if it contains a sequenceof three or more letters. A string containing at most two letters ina row will be considered as a candidate if it has more letters thannon-letters. The command disregards strings containing no letters,or isolated letters only. It also disregards strings within comments,or strings already marked with some keyword PO mode knows (see below).

If you have never told Emacs about some TAGS file to use, thecommand will request that you specify one from the minibuffer, thefirst time you use the command. You may later change your TAGSfile by using the regular Emacs command M-x visit-tags-table,which will ask you to name the precise TAGS file you wantto use. See Tag Tables in The Emacs Editor.

Each time you use the , command, the search resumes from where it wasleft by the previous search, and goes through all program sources,obeying the TAGS file, until all sources have been processed.However, by giving a prefix argument to the command (C-u ,), you may request that the search be restarted all over againfrom the first program source; but in this case, strings that yourecently marked as translatable will be automatically skipped.

Using this , command does not prevent using of other regularEmacs tags commands. For example, regular tags-search ortags-query-replace commands may be used without disrupting theindependent , search sequence. However, as implemented, theinitial , command (or the , command is used with aprefix) might also reinitialize the regular Emacs tags searching to thefirst tags file, this reinitialization might be considered spurious.

The M-, (po-mark-translatable) command will mark therecently found string with the ‘_’ keyword. The M-.(po-select-mark-and-mark) command will request that you typeone keyword from the minibuffer and use that keyword for markingthe string. Both commands will automatically create a new PO fileuntranslated entry for the string being marked, and make it thecurrent entry (making it easy for you to immediately proceed to itstranslation, if you feel like doing it right away). It is possiblethat the modifications made to the program source by M-, orM-. render some source line longer than 80 columns, forcing youto break and re-indent this line differently. You may use the Ocommand from PO mode, or any other window changing command fromEmacs, to break out into the program source window, and do anyneeded adjustments. You will have to use some regular Emacs commandto return the cursor to the PO file window, if you want command, for the next string, say.

The M-. command has a few built-in speedups, so you do nothave to explicitly type all keywords all the time. The first suchspeedup is that you are presented with a preferred keyword,which you may accept by merely typing RET at the prompt.The second speedup is that you may type any non-ambiguous prefix of thekeyword you really mean, and the command will complete it automaticallyfor you. This also means that PO mode has to know allyour possible keywords, and that it will not accept mistyped keywords.

If you reply ? to the keyword request, the command gives alist of all known keywords, from which you may choose. When thecommand is prefixed by an argument (C-u M-.), it inhibitsupdating any program source or PO file buffer, and does some simplekeyword management instead. In this case, the command asks for akeyword, written in full, which becomes a new allowed keyword forlater M-. commands. Moreover, this new keyword automaticallybecomes the preferred keyword for later commands. By typingan already known keyword in response to C-u M-., one merelychanges the preferred keyword and does nothing more.

All keywords known for M-. are recognized by the , commandwhen scanning for strings, and strings already marked by any of thoseknown keywords are automatically skipped. If many PO files are openedsimultaneously, each one has its own independent set of known keywords.There is no provision in PO mode, currently, for deleting a knownkeyword, you have to quit the file (maybe using q) and reopenit afresh. When a PO file is newly brought up in an Emacs window, only‘gettext’ and ‘_’ are known as keywords, and ‘gettext’is preferred for the M-. command. In fact, this is not useful toprefer ‘_’, as this one is already built in the M-, command.


Next: Special cases, Previous: Marking, Up: Sources   [Contents][Index]

4.6 Special Comments preceding Keywords

In C programs strings are often used within calls of functions from theprintf family. The special thing about these format strings isthat they can contain format specifiers introduced with %. Assumewe have the code

printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));

A possible German translation for the above string might be:

"%d Zeichen lang ist die Zeichenkette `%s'"

A C programmer, even if he cannot speak German, will recognize thatthere is something wrong here. The order of the two format specifiersis changed but of course the arguments in the printf don’t have.This will most probably lead to problems because now the length of thestring is regarded as the address.

To prevent errors at runtime caused by translations, the msgfmttool can check statically whether the arguments in the original and thetranslation string match in type and number. If this is not the caseand the ‘-c’ option has been passed to msgfmt, msgfmtwill give an error and refuse to produce a MO file. Thus consistentuse of ‘msgfmt -c’ will catch the error, so that it cannot causeproblems at runtime.

If the word order in the above German translation would be correct onewould have to write

"%2$d Zeichen lang ist die Zeichenkette `%1$s'"

The routines in msgfmt know about this special notation.

Because not all strings in a program will be format strings, it is notuseful for msgfmt to test all the strings in the .po file.This might cause problems because the string might contain what lookslike a format specifier, but the string is not used in printf.

Therefore xgettext adds a special tag to those messages itthinks might be a format string. There is no absolute rule for this,only a heuristic. In the .po file the entry is marked using thec-format flag in the #, comment line (see PO Files).

The careful reader now might say that this again can cause problems.The heuristic might guess it wrong. This is true and thereforexgettext knows about a special kind of comment which letsthe programmer take over the decision. If in the same line as orthe immediately preceding line to the gettext keywordthe xgettext program finds a comment containing the wordsxgettext:c-format, it will mark the string in any case withthe c-format flag. This kind of comment should be used whenxgettext does not recognize the string as a format string butit really is one and it should be tested. Please note that when thecomment is in the same line as the gettext keyword, it must bebefore the string to be translated.

This situation happens quite often. The printf function is oftencalled with strings which do not contain a format specifier. Of courseone would normally use fputs but it does happen. In this casexgettext does not recognize this as a format string but whathappens if the translation introduces a valid format specifier? Theprintf function will try to access one of the parameters but noneexists because the original code does not pass any parameters.

xgettext of course could make a wrong decision the other wayround, i.e. a string marked as a format string actually is not a formatstring. In this case the msgfmt might give too many warnings andwould prevent translating the .po file. The method to preventthis wrong decision is similar to the one used above, only the commentto use must contain the string xgettext:no-c-format.

If a string is marked with c-format and this is not correct theuser can find out who is responsible for the decision. Seexgettext Invocation to see how the --debug option can beused for solving this problem.


4.7 Special Cases of Translatable Strings

The attentive reader might now point out that it is not always possibleto mark translatable string with gettext or something like this.Consider the following case:

{
  static const char *messages[] = {
    "some very meaningful message",
    "and another one"
  };
  const char *string;
  …
  string
    = index > 1 ? "a default message" : messages[index];

  fputs (string);
  …
}

While it is no problem to mark the string "a default message" itis not possible to mark the string initializers for messages.What is to be done? We have to fulfill two tasks. First we have to mark thestrings so that the xgettext program (see xgettext Invocation)can find them, and second we have to translate the string at runtimebefore printing them.

The first task can be fulfilled by creating a new keyword, which names ano-op. For the second we have to mark all access points to a stringfrom the array. So one solution can look like this:

#define gettext_noop(String) String

{
  static const char *messages[] = {
    gettext_noop ("some very meaningful message"),
    gettext_noop ("and another one")
  };
  const char *string;
  …
  string
    = index > 1 ? gettext ("a default message") : gettext (messages[index]);

  fputs (string);
  …
}

Please convince yourself that the string which is written byfputs is translated in any case. How to get xgettext knowthe additional keyword gettext_noop is explained in xgettext Invocation.

The above is of course not the only solution. You could also come alongwith the following one:

#define gettext_noop(String) String

{
  static const char *messages[] = {
    gettext_noop ("some very meaningful message"),
    gettext_noop ("and another one")
  };
  const char *string;
  …
  string
    = index > 1 ? gettext_noop ("a default message") : messages[index];

  fputs (gettext (string));
  …
}

But this has a drawback. The programmer has to take care thathe uses gettext_noop for the string "a default message".A use of gettext could have in rare cases unpredictable results.

One advantage is that you need not make control flow analysis to makesure the output is really translated in any case. But this analysis isgenerally not very difficult. If it should be in any situation you canuse this second method in this situation.


Next: Names, Previous: Special cases, Up: Sources   [Contents][Index]

4.8 Letting Users Report Translation Bugs

Code sometimes has bugs, but translations sometimes have bugs too. Theusers need to be able to report them. Reporting translation bugs to theprogrammer or maintainer of a package is not very useful, since themaintainer must never change a translation, except on behalf of thetranslator. Hence the translation bugs must be reported to thetranslators.

Here is a way to organize this so that the maintainer does not need toforward translation bug reports, nor even keep a list of the addresses ofthe translators or their translation teams.

Every program has a place where is shows the bug report address. ForGNU programs, it is the code which handles the “–help” option,typically in a function called “usage”. In this place, instruct thetranslator to add her own bug reporting address. For example, if thatcode has a statement

printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);

you can add some translator instructions like this:

/* TRANSLATORS: The placeholder indicates the bug-reporting address
   for this package.  Please add _another line_ saying
   "Report translation bugs to <...>\n" with the address for translation
   bugs (typically your translation team's web or email address).  */
printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);

These will be extracted by ‘xgettext’, leading to a .pot file thatcontains this:

#. TRANSLATORS: The placeholder indicates the bug-reporting address
#. for this package.  Please add _another line_ saying
#. "Report translation bugs to <...>\n" with the address for translation
#. bugs (typically your translation team's web or email address).
#: src/hello.c:178
#, c-format
msgid "Report bugs to <%s>.\n"
msgstr ""

4.9 Marking Proper Names for Translation

Should names of persons, cities, locations etc. be marked for translationor not? People who only know languages that can be written with Latinletters (English, Spanish, French, German, etc.) are tempted to say “no”,because names usually do not change when transported between these languages.However, in general when translating from one script to another, namesare translated too, usually phonetically or by transliteration. Forexample, Russian or Greek names are converted to the Latin alphabet whenbeing translated to English, and English or French names are convertedto the Katakana script when being translated to Japanese. This isnecessary because the speakers of the target language in general cannotread the script the name is originally written in.

As a programmer, you should therefore make sure that names are markedfor translation, with a special comment telling the translators that itis a proper name and how to pronounce it. In its simple form, it lookslike this:

printf (_("Written by %s.\n"),
        /* TRANSLATORS: This is a proper name.  See the gettext
           manual, section Names.  Note this is actually a non-ASCII
           name: The first name is (with Unicode escapes)
           "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
           Pronunciation is like "fraa-swa pee-nar".  */
        _("Francois Pinard"));

The GNU gnulib library offers a module ‘propername’(http://www.gnu.org/software/gnulib/MODULES.html#module=propername)which takes care to automatically append the original name, in parentheses,to the translated name. For names that cannot be written in ASCII, italso frees the translator from the task of entering the appropriate non-ASCIIcharacters if no script change is needed. In this more comfortable form,it looks like this:

printf (_("Written by %s and %s.\n"),
        proper_name ("Ulrich Drepper"),
        /* TRANSLATORS: This is a proper name.  See the gettext
           manual, section Names.  Note this is actually a non-ASCII
           name: The first name is (with Unicode escapes)
           "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
           Pronunciation is like "fraa-swa pee-nar".  */
        proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard"));

You can also write the original name directly in Unicode (rather than withUnicode escapes or HTML entities) and denote the pronunciation using theInternational Phonetic Alphabet (seehttp://www.wikipedia.org/wiki/International_Phonetic_Alphabet).

As a translator, you should use some care when translating names, becauseit is frustrating if people see their names mutilated or distorted.

If your language uses the Latin script, all you need to do is to reproducethe name as perfectly as you can within the usual character set of yourlanguage. In this particular case, this means to provide a translationcontaining the c-cedilla character. If your language uses a differentscript and the people speaking it don’t usually read Latin words, it meanstransliteration. If the programmer used the simple case, you should stillgive, in parentheses, the original writing of the name – for the sake ofthe people that do read the Latin script. If the programmer used the‘propername’ module mentioned above, you don’t need to give the originalwriting of the name in parentheses, because the program will already do so.Here is an example, using Greek as the target script:

#. This is a proper name.  See the gettext
#. manual, section Names.  Note this is actually a non-ASCII
#. name: The first name is (with Unicode escapes)
#. "Fran\u00e7ois" or (with HTML entities) "Fran&ccedil;ois".
#. Pronunciation is like "fraa-swa pee-nar".
msgid "Francois Pinard"
msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho"
       " (Francois Pinard)"

Because translation of names is such a sensitive domain, it is a goodidea to test your translation before submitting it.


Previous: Names, Up: Sources   [Contents][Index]

4.10 Preparing Library Sources

When you are preparing a library, not a program, for the use ofgettext, only a few details are different. Here we assume thatthe library has a translation domain and a POT file of its own. (Ifit uses the translation domain and POT file of the main program, thenthe previous sections apply without changes.)

  1. The library code doesn’t call setlocale (LC_ALL, ""). It’s theresponsibility of the main program to set the locale. The library’sdocumentation should mention this fact, so that developers of programsusing the library are aware of it.
  2. The library code doesn’t call textdomain (PACKAGE), because itwould interfere with the text domain set by the main program.
  3. The initialization code for a program was
      setlocale (LC_ALL, "");
      bindtextdomain (PACKAGE, LOCALEDIR);
      textdomain (PACKAGE);
    

    For a library it is reduced to

      bindtextdomain (PACKAGE, LOCALEDIR);
    

    If your library’s API doesn’t already have an initialization function,you need to create one, containing at least the bindtextdomaininvocation. However, you usually don’t need to export and document thisinitialization function: It is sufficient that all entry points of thelibrary call the initialization function if it hasn’t been called before.The typical idiom used to achieve this is a static boolean variable thatindicates whether the initialization function has been called. Like this:

    static bool libfoo_initialized;
    
    static void
    libfoo_initialize (void)
    {
      bindtextdomain (PACKAGE, LOCALEDIR);
      libfoo_initialized = true;
    }
    
    /* This function is part of the exported API.  */
    struct foo *
    create_foo (...)
    {
      /* Must ensure the initialization is performed.  */
      if (!libfoo_initialized)
        libfoo_initialize ();
      ...
    }
    
    /* This function is part of the exported API.  The argument must be
       non-NULL and have been created through create_foo().  */
    int
    foo_refcount (struct foo *argument)
    {
      /* No need to invoke the initialization function here, because
         create_foo() must already have been called before.  */
      ...
    }
    
  4. The usual declaration of the ‘_’ macro in each source file was
    #include <libintl.h>
    #define _(String) gettext (String)
    

    for a program. For a library, which has its own translation domain,it reads like this:

    #include <libintl.h>
    #define _(String) dgettext (PACKAGE, String)
    

    In other words, dgettext is used instead of gettext.Similarly, the dngettext function should be used in place of thengettext function.


Next: Creating, Previous: Sources, Up: Top   [Contents][Index]

5 Making the PO Template File

After preparing the sources, the programmer creates a PO template file.This section explains how to use xgettext for this purpose.

xgettext creates a file named domainname.po. Youshould then rename it to domainname.pot. (Why doesn’txgettext create it under the name domainname.potright away? The answer is: for historical reasons. When xgettextwas specified, the distinction between a PO file and PO file templatewas fuzzy, and the suffix ‘.pot’ wasn’t in use at that time.)


Previous: Template, Up: Template   [Contents][Index]

5.1 Invoking the xgettext Program

xgettext [option] [inputfile] …

The xgettext program extracts translatable strings from giveninput files.

5.1.1 Input file location
inputfile

Input files.

-f file’ ‘ --files-from=file

Read the names of the input files from file instead of gettingthem from the command line.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If inputfile is ‘-’, standard input is read.

5.1.2 Output file location
-d name’ ‘ --default-domain=name

Use name.po for output (instead of messages.po).

-o file’ ‘ --output=file

Write output to specified file (instead of name.po ormessages.po).

-p dir’ ‘ --output-dir=dir

Output files will be placed in directory dir.

If the output file is ‘-’ or ‘/dev/stdout’, the outputis written to standard output.

5.1.3 Choice of input file language
-L name’ ‘ --language=name

Specifies the language of the input files. The supported languagesare C, C++, ObjectiveC, PO, Shell,Python, Lisp, EmacsLisp, librep, Scheme,Smalltalk, Java, JavaProperties, C#, awk,YCP, Tcl, Perl, PHP, GCC-source,NXStringTable, RST, Glade, Lua, JavaScript,Vala, GSettings, Desktop.

-C’ ‘ --c++

This is a shorthand for --language=C++.

By default the language is guessed depending on the input file nameextension.

5.1.4 Input file interpretation
--from-code=name

Specifies the encoding of the input files. This option is needed onlyif some untranslated message strings or their corresponding commentscontain non-ASCII characters. Note that Tcl and Glade input files arealways assumed to be in UTF-8, regardless of this option.

By default the input files are assumed to be in ASCII.

5.1.5 Operation mode
-j’ ‘ --join-existing

Join messages with existing file.

-x file’ ‘ --exclude-file=file

Entries from file are not extracted. file should be a PO orPOT file.

-c[tag]’ ‘ --add-comments[=tag]

Place comment blocks starting with tag and preceding keyword linesin the output file. Without a tag, the option means to put allcomment blocks preceding keyword lines in the output file.

Note that comment blocks supposed to be extracted must be adjacent tokeyword lines. For example, in the following C source code:

/* This is the first comment.  */
gettext ("foo");

/* This is the second comment: not extracted  */
gettext (
  "bar");

gettext (
  /* This is the third comment.  */
  "baz");

The second comment line will not be extracted, because there is oneblank line between the comment line and the keyword.

--check[=CHECK]

Perform a syntax check on msgid and msgid_plural. The supported checksare:

ellipsis-unicode

Prefer Unicode ellipsis character over ASCII ...

space-ellipsis

Prohibit whitespace before an ellipsis character

quote-unicode

Prefer Unicode quotation marks over ASCII "'`

bullet-unicode

Prefer Unicode bullet character over ASCII * or -

The option has an effect on all input files. To enable or disablechecks for a certain string, you can mark it with an xgettext:special comment in the source file. For example, if you specify the--check=space-ellipsis option, but want to suppress the check ona particular string, add the following comment:

/* xgettext: no-space-ellipsis-check */
gettext ("We really want a space before ellipsis here ...");

The xgettext: comment can be followed by flags separated with acomma. The possible flags are of the form ‘[no-]name-check’,where name is the name of a valid syntax check. If a flag isprefixed by no-, the meaning is negated.

Some tests apply the checks to each sentence within the msgid, ratherthan the whole string. xgettext detects the end of sentence byperforming a pattern match, which usually looks for a period followed bya certain number of spaces. The number is specified with the--sentence-end option.

--sentence-end[=TYPE]

The supported values are:

single-space

Expect at least one whitespace after a period

double-space

Expect at least two whitespaces after a period

5.1.6 Language specific options
-a’ ‘ --extract-all

Extract all strings.

This option has an effect with most languages, namely C, C++, ObjectiveC,Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, Tcl, Perl, PHP,GCC-source, Glade, Lua, JavaScript, Vala, GSettings.

-k[keywordspec]’ ‘ --keyword[=keywordspec]

Specify keywordspec as an additional keyword to be looked for.Without a keywordspec, the option means to not use default keywords.

If keywordspec is a C identifier id, xgettext looksfor strings in the first argument of each call to the function or macroid. If keywordspec is of the form‘id:argnum’, xgettext looks for strings in theargnumth argument of the call. If keywordspec is of the form‘id:argnum1,argnum2’, xgettext looks forstrings in the argnum1st argument and in the argnum2nd argumentof the call, and treats them as singular/plural variants for a messagewith plural handling. Also, if keywordspec is of the form‘id:contextargnumc,argnum’ or‘id:argnum,contextargnumc’, xgettext treatsstrings in the contextargnumth argument as a context specifier.And, as a special-purpose support for GNOME, if keywordspec is of theform ‘id:argnumg’, xgettext recognizes theargnumth argument as a string with context, using the GNOME glibsyntax ‘"msgctxt|msgid"’.
Furthermore, if keywordspec is of the form‘id:…,totalnumargst’, xgettext recognizes thisargument specification only if the number of actual arguments is equal tototalnumargs. This is useful for disambiguating overloaded functioncalls in C++.
Finally, if keywordspec is of the form‘id:argnum...,"xcomment"’, xgettext, whenextracting a message from the specified argument strings, adds an extractedcomment xcomment to the message. Note that when used through a normalshell command line, the double-quotes around the xcomment need to beescaped.

This option has an effect with most languages, namely C, C++, ObjectiveC,Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, Tcl, Perl, PHP,GCC-source, Glade, Lua, JavaScript, Vala, GSettings, Desktop.

The default keyword specifications, which are always looked for if notexplicitly disabled, are language dependent. They are:

  • For C, C++, and GCC-source: gettext, dgettext:2,dcgettext:2, ngettext:1,2, dngettext:2,3,dcngettext:2,3, gettext_noop, and pgettext:1c,2,dpgettext:2c,3, dcpgettext:2c,3, npgettext:1c,2,3,dnpgettext:2c,3,4, dcnpgettext:2c,3,4.
  • For Objective C: Like for C, and also NSLocalizedString, _,NSLocalizedStaticString, __.
  • For Shell scripts: gettext, ngettext:1,2, eval_gettext,eval_ngettext:1,2.
  • For Python: gettext, ugettext, dgettext:2,ngettext:1,2, ungettext:1,2, dngettext:2,3, _.
  • For Lisp: gettext, ngettext:1,2, gettext-noop.
  • For EmacsLisp: _.
  • For librep: _.
  • For Scheme: gettext, ngettext:1,2, gettext-noop.
  • For Java: GettextResource.gettext:2,GettextResource.ngettext:2,3, GettextResource.pgettext:2c,3,GettextResource.npgettext:2c,3,4, gettext, ngettext:1,2,pgettext:1c,2, npgettext:1c,2,3, getString.
  • For C#: GetString, GetPluralString:1,2,GetParticularString:1c,2, GetParticularPluralString:1c,2,3.
  • For awk: dcgettext, dcngettext:1,2.
  • For Tcl: ::msgcat::mc.
  • For Perl: gettext, %gettext, $gettext, dgettext:2,dcgettext:2, ngettext:1,2, dngettext:2,3,dcngettext:2,3, gettext_noop.
  • For PHP: _, gettext, dgettext:2, dcgettext:2,ngettext:1,2, dngettext:2,3, dcngettext:2,3.
  • For Glade 1: label, title, text, format,copyright, comments, preview_text, tooltip.
  • For Lua: _, gettext.gettext, gettext.dgettext:2,gettext.dcgettext:2, gettext.ngettext:1,2,gettext.dngettext:2,3, gettext.dcngettext:2,3.
  • For JavaScript: _, gettext, dgettext:2,dcgettext:2, ngettext:1,2, dngettext:2,3,pgettext:1c,2, dpgettext:2c,3.
  • For Vala: _, Q_, N_, NC_, dgettext:2,dcgettext:2, ngettext:1,2, dngettext:2,3,dpgettext:2c,3, dpgettext2:2c,3.
  • For Desktop: Name, GenericName, Comment,Icon, Keywords.

To disable the default keyword specifications, the option ‘-k’ or‘--keyword’ or ‘--keyword=’, without a keywordspec, can beused.

--flag=word:arg:flag

Specifies additional flags for strings occurring as part of the argthargument of the function word. The possible flags are the possibleformat string indicators, such as ‘c-format’, and their negations,such as ‘no-c-format’, possibly prefixed with ‘pass-’.
The meaning of --flag=function:arg:lang-formatis that in language lang, the specified function expects asargth argument a format string. (For those of you familiar withGCC function attributes, --flag=function:arg:c-format isroughly equivalent to the declaration‘__attribute__ ((__format__ (__printf__, arg, ...)))’ attachedto function in a C source file.)For example, if you use the ‘error’ function from GNU libc, you canspecify its behaviour through --flag=error:3:c-format. The effect ofthis specification is that xgettext will mark as format strings allgettext invocations that occur as argth argument offunction.This is useful when such strings contain no format string directives:together with the checks done by ‘msgfmt -c’ it will ensure thattranslators cannot accidentally use format string directives that wouldlead to a crash at runtime.
The meaning of --flag=function:arg:pass-lang-formatis that in language lang, if the function call occurs in aposition that must yield a format string, then its argth argumentmust yield a format string of the same type as well. (If you know GCCfunction attributes, the --flag=function:arg:pass-c-formatoption is roughly equivalent to the declaration‘__attribute__ ((__format_arg__ (arg)))’ attached to functionin a C source file.)For example, if you use the ‘_’ shortcut for the gettext function,you should use --flag=_:1:pass-c-format. The effect of thisspecification is that xgettext will propagate a format stringrequirement for a _("string") call to its first argument, the literal"string", and thus mark it as a format string.This is useful when such strings contain no format string directives:together with the checks done by ‘msgfmt -c’ it will ensure thattranslators cannot accidentally use format string directives that wouldlead to a crash at runtime.
This option has an effect with most languages, namely C, C++, ObjectiveC,Shell, Python, Lisp, EmacsLisp, librep, Scheme, Java, C#, awk, YCP, Tcl, Perl, PHP,GCC-source, Lua, JavaScript, Vala.

-T’ ‘ --trigraphs

Understand ANSI C trigraphs for input.
This option has an effect only with the languages C, C++, ObjectiveC.

--qt

Recognize Qt format strings.
This option has an effect only with the language C++.

--kde

Recognize KDE 4 format strings.
This option has an effect only with the language C++.

--boost

Recognize Boost format strings.
This option has an effect only with the language C++.

--debug

Use the flags c-format and possible-c-format to show who wasresponsible for marking a message as a format string. The latter form isused if the xgettext program decided, the former form is used ifthe programmer prescribed it.

By default only the c-format form is used. The translator shouldnot have to care about these details.

This implementation of xgettext is able to process a few awkwardcases, like strings in preprocessor macros, ANSI concatenation ofadjacent strings, and escaped end of lines for continued strings.

5.1.7 Output details
--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if no message is defined.

-i’ ‘ --indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines. Note that usingthis option makes it harder for technically skilled translators to understandeach message’s context.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

--properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

--its=file

Use ITS rules defined in file.Note that this is only effective with XML files.

--itstool

Write out comments recognized by itstool (http://itstool.org).Note that this is only effective with XML files.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

-F’ ‘ --sort-by-file

Sort output by file location.

--omit-header

Don’t write header with ‘msgid ""’ entry.

This is useful for testing purposes because it eliminates a sourceof variance for generated .gmo files. With --omit-header,two invocations of xgettext on the same files with the sameoptions at different times are guaranteed to produce the same results.

Note that using this option will lead to an error if the resulting filewould not entirely be in ASCII.

--copyright-holder=string

Set the copyright holder in the output. string should be thecopyright holder of the surrounding package. (Note that the msgstrstrings, extracted from the package’s sources, belong to the copyrightholder of the package.) Translators are expected to transfer or disclaimthe copyright for their translations, so that package maintainers candistribute them without legal risk. If string is empty, the outputfiles are marked as being in the public domain; in this case, the translatorsare expected to disclaim their copyright, again so that package maintainerscan distribute them without legal risk.

The default value for string is the Free Software Foundation, Inc.,simply because xgettext was first used in the GNU project.

--foreign-user

Omit FSF copyright in output. This option is equivalent to‘--copyright-holder=''’. It can be useful for packages outside the GNUproject that want their translations to be in the public domain.

--package-name=package

Set the package name in the header of the output.

--package-version=version

Set the package version in the header of the output. This option has aneffect only if the ‘--package-name’ option is also used.

--msgid-bugs-address=email@address

Set the reporting address for msgid bugs. This is the email address or URLto which the translators shall report bugs in the untranslated strings:

  • - Strings which are not entire sentences; see the maintainer guidelinesin Preparing Strings.
  • - Strings which use unclear terms or require additional context to beunderstood.
  • - Strings which make invalid assumptions about notation of date, time ormoney.
  • - Pluralisation problems.
  • - Incorrect English spelling.
  • - Incorrect formatting.

It can be your email address, or a mailing list address where translatorscan write to without being subscribed, or the URL of a web page throughwhich the translators can contact you.

The default value is empty, which means that translators will be clueless!Don’t forget to specify this option.

-m[string]’ ‘ --msgstr-prefix[=string]

Use string (or "" if not specified) as prefix for msgstr values.

-M[string]’ ‘ --msgstr-suffix[=string]

Use string (or "" if not specified) as suffix for msgstr values.

5.1.8 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


Next: Updating, Previous: Template, Up: Top   [Contents][Index]

6 Creating a New PO File

When starting a new translation, the translator creates a file calledLANG.po, as a copy of the package.pot templatefile with modifications in the initial comments (at the beginning of the file)and in the header entry (the first entry, near the beginning of the file).

The easiest way to do so is by use of the ‘msginit’ program.For example:

$ cd PACKAGE-VERSION
$ cd po
$ msginit

The alternative way is to do the copy and modifications by hand.To do so, the translator copies package.pot toLANG.po. Then she modifies the initial comments andthe header entry of this file.


Next: Header Entry, Previous: Creating, Up: Creating   [Contents][Index]

6.1 Invoking the msginit Program

msginit [option]

The msginit program creates a new PO file, initializing the metainformation with values from the user’s environment.

Here are more details. The following header fields of a PO file areautomatically filled, when possible.

Project-Id-Version

The value is guessed from the configure script or any other filesin the current directory.

PO-Revision-Date

The value is taken from the PO-Creation-Data in the input POTfile, or the current date is used.

Last-Translator

The value is taken from user’s password file entry and the mailerconfiguration files.

Language-Team, Language

These values are set according to the current locale and the predefinedlist of translation teams.

MIME-Version, Content-Type, Content-Transfer-Encoding

These values are set according to the content of the POT file and thecurrent locale. If the POT file contains charset=UTF-8, it means thatthe POT file contains non-ASCII characters, and we keep the UTF-8encoding. Otherwise, when the POT file is plain ASCII, we use thelocale’s encoding.

Plural-Forms

The value is first looked up from the embedded table.

As an experimental feature, you can instruct msginit to use theinformation from Unicode CLDR, by setting the GETTEXTCLDRDIRenvironment variable.

6.1.1 Input file location
-i inputfile’ ‘ --input=inputfile

Input POT file.

If no inputfile is given, the current directory is searched for thePOT file. If it is ‘-’, standard input is read.

6.1.2 Output file location
-o file’ ‘ --output-file=file

Write output to specified PO file.

If no output file is given, it depends on the ‘--locale’ option or theuser’s locale setting. If it is ‘-’, the results are written tostandard output.

6.1.3 Input file syntax
-P’ ‘ --properties-input

Assume the input file is a Java ResourceBundle in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input file is a NeXTstep/GNUstep localized resource file in.strings syntax, not in PO file syntax.

6.1.4 Output details
-l ll_CC’ ‘ --locale=ll_CC

Set target locale. ll should be a language code, and CC shouldbe a country code. The command ‘locale -a’ can be used to output a listof all installed locales. The default is the user’s locale setting.

--no-translator

Declares that the PO file will not have a human translator and is insteadautomatically generated.

--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

6.1.5 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


6.2 Filling in the Header Entry

The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and"FIRST AUTHOR <EMAIL@ADDRESS>, YEAR" ought to be replaced by sensibleinformation. This can be done in any text editor; if Emacs is usedand it switched to PO mode automatically (because it has recognizedthe file’s suffix), you can disable it by typing M-x fundamental-mode.

Modifying the header entry can already be done using PO mode: in Emacs,type M-x po-mode RET and then RET again to start editing theentry. You should fill in the following fields.

Project-Id-Version

This is the name and version of the package. Fill it in if it has notalready been filled in by xgettext.

Report-Msgid-Bugs-To

This has already been filled in by xgettext. It contains an emailaddress or URL where you can report bugs in the untranslated strings:

  • - Strings which are not entire sentences, see the maintainer guidelinesin Preparing Strings.
  • - Strings which use unclear terms or require additional context to beunderstood.
  • - Strings which make invalid assumptions about notation of date, time ormoney.
  • - Pluralisation problems.
  • - Incorrect English spelling.
  • - Incorrect formatting.
POT-Creation-Date

This has already been filled in by xgettext.

PO-Revision-Date

You don’t need to fill this in. It will be filled by the PO file editorwhen you save the file.

Last-Translator

Fill in your name and email address (without double quotes).

Language-Team

Fill in the English name of the language, and the email address orhomepage URL of the language team you are part of.

Before starting a translation, it is a good idea to get in touch withyour translation team, not only to make sure you don’t do duplicated work,but also to coordinate difficult linguistic issues.

In the Free Translation Project, each translation team has its own mailinglist. The up-to-date list of teams can be found at the Free TranslationProject’s homepage, http://translationproject.org/, in the "Teams"area.

Language

Fill in the language code of the language. This can be in one of threeforms:

  • - ‘ll’, an ISO 639 two-letter language code (lowercase).See Language Codes for the list of codes.
  • - ‘ll_CC’, where ‘ll’ is an ISO 639 two-letterlanguage code (lowercase) and ‘CC’ is an ISO 3166 two-lettercountry code (uppercase). The country code specification is not redundant:Some languages have dialects in different countries. For example,‘de_AT’ is used for Austria, and ‘pt_BR’ for Brazil. The countrycode serves to distinguish the dialects. See Language Codes andCountry Codes for the lists of codes.
  • - ‘ll_CC@variant’, where ‘ll’ is anISO 639 two-letter language code (lowercase), ‘CC’ is anISO 3166 two-letter country code (uppercase), and ‘variant’ isa variant designator. The variant designator (lowercase) can be a scriptdesignator, such as ‘latin’ or ‘cyrillic’.

The naming convention ‘ll_CC’ is also the way locales arenamed on systems based on GNU libc. But there are three important differences:

  • In this PO file field, but not in locale names, ‘ll_CC’combinations denoting a language’s main dialect are abbreviated as‘ll’. For example, ‘de’ is equivalent to ‘de_DE’(German as spoken in Germany), and ‘pt’ to ‘pt_PT’ (Portuguese asspoken in Portugal) in this context.
  • In this PO file field, suffixes like ‘.encoding’ are not used.
  • In this PO file field, variant designators that are not relevant to messagetranslation, such as ‘@euro’, are not used.

So, if your locale name is ‘de_DE.UTF-8’, the language specification inPO files is just ‘de’.

Content-Type

Replace ‘CHARSET’ with the character encoding used for your language,in your locale, or UTF-8. This field is needed for correct operation of themsgmerge and msgfmt programs, as well as for users whoselocale’s character encoding differs from yours (see Charset conversion).

You get the character encoding of your locale by running the shell command‘locale charmap’. If the result is ‘C’ or ‘ANSI_X3.4-1968’,which is equivalent to ‘ASCII’ (= ‘US-ASCII’), it means that yourlocale is not correctly configured. In this case, ask your translationteam which charset to use. ‘ASCII’ is not usable for any languageexcept Latin.

Because the PO files must be portable to operating systems with less advancedinternationalization facilities, the character encodings that can be usedare limited to those supported by both GNU libc and GNUlibiconv. These are:ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3,ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7,ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-14,ISO-8859-15,KOI8-R, KOI8-U, KOI8-T,CP850, CP866, CP874,CP932, CP949, CP950, CP1250, CP1251,CP1252, CP1253, CP1254, CP1255, CP1256,CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW,BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS,JOHAB, TIS-620, VISCII, GEORGIAN-PS, UTF-8.

In the GNU system, the following encodings are frequently used for thecorresponding languages.

  • ISO-8859-1 forAfrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,English, Estonian, Faroese, Finnish, French, Galician, German,Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,Walloon,
  • ISO-8859-2 forBosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,Slovenian,
  • ISO-8859-3 for Maltese,
  • ISO-8859-5 for Macedonian, Serbian,
  • ISO-8859-6 for Arabic,
  • ISO-8859-7 for Greek,
  • ISO-8859-8 for Hebrew,
  • ISO-8859-9 for Turkish,
  • ISO-8859-13 for Latvian, Lithuanian, Maori,
  • ISO-8859-14 for Welsh,
  • ISO-8859-15 forBasque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,Italian, Portuguese, Spanish, Swedish, Walloon,
  • KOI8-R for Russian,
  • KOI8-U for Ukrainian,
  • KOI8-T for Tajik,
  • CP1251 for Bulgarian, Belarusian,
  • GB2312, GBK, GB18030for simplified writing of Chinese,
  • BIG5, BIG5-HKSCSfor traditional writing of Chinese,
  • EUC-JP for Japanese,
  • EUC-KR for Korean,
  • TIS-620 for Thai,
  • GEORGIAN-PS for Georgian,
  • UTF-8 for any language, including those listed above.

When single quote characters or double quote characters are used intranslations for your language, and your locale’s encoding is one of theISO-8859-* charsets, it is best if you create your PO files in UTF-8encoding, instead of your locale’s encoding. This is because in UTF-8the real quote characters can be represented (single quote characters:U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none ofISO-8859-* charsets has them all. Users in UTF-8 locales will see thereal quote characters, whereas users in ISO-8859-* locales will see thevertical apostrophe and the vertical double quote instead (because that’swhat the character set conversion will transliterate them to).

To enter such quote characters under X11, you can change your keyboardmapping using the xmodmap program. The X11 names of the quotecharacters are "leftsinglequotemark", "rightsinglequotemark","leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark","doublelowquotemark".

Note that only recent versions of GNU Emacs support the UTF-8 encoding:Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn’tsupport the UTF-8 encoding.

The character encoding name can be written in either upper or lower case.Usually upper case is preferred.

Content-Transfer-Encoding

Set this to 8bit.

Plural-Forms

This field is optional. It is only needed if the PO file has plural forms.You can find them by searching for the ‘msgid_plural’ keyword. Theformat of the plural forms field is described in Plural forms andTranslating plural forms.


Next: Editing, Previous: Creating, Up: Top   [Contents][Index]

7 Updating Existing PO Files


Previous: Updating, Up: Updating   [Contents][Index]

7.1 Invoking the msgmerge Program

msgmerge [option] def.po ref.pot

The msgmerge program merges two Uniforum style .po files together.The def.po file is an existing PO file with translations which willbe taken over to the newly created file as long as they still match;comments will be preserved, but extracted comments and file positions willbe discarded. The ref.pot file is the last created PO file withup-to-date source references but old translations, or a PO Template file(generally created by xgettext); any translations or commentsin the file will be discarded, however dot comments and file positionswill be preserved. Where an exact match cannot be found, fuzzy matchingis used to produce better results.

7.1.1 Input file location
def.po

Translations referring to old sources.

ref.pot

References to the new sources.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

-C file’ ‘ --compendium=file

Specify an additional library of message translations. See Compendium.This option may be specified more than once.

7.1.2 Operation mode
-U’ ‘ --update

Update def.po. Do nothing if def.po is already up to date.

7.1.3 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

7.1.4 Output file location in update mode

The result is written back to def.po.

--backup=control

Make a backup of def.po

--suffix=suffix

Override the usual backup suffix.

The version control method may be selected via the --backup optionor through the VERSION_CONTROL environment variable. Here are thevalues:

none’ ‘ off

Never make backups (even if --backup is given).

numbered’ ‘ t

Make numbered backups.

existing’ ‘ nil

Make numbered backups if numbered backups for this file already exist,otherwise make simple backups.

simple’ ‘ never

Always make simple backups.

The backup suffix is ‘~’, unless set with --suffix or theSIMPLE_BACKUP_SUFFIX environment variable.

7.1.5 Operation modifiers
-m’ ‘ --multi-domain

Apply ref.pot to each of the domains in def.po.

-N’ ‘ --no-fuzzy-matching

Do not use fuzzy matching when an exact match is not found. This may speedup the operation considerably.

--previous

Keep the previous msgids of translated messages, marked with ‘#|’, whenadding the fuzzy marker to such messages.

7.1.6 Input file syntax
-P’ ‘ --properties-input

Assume the input files are Java ResourceBundles in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input files are NeXTstep/GNUstep localized resource files in.strings syntax, not in PO file syntax.

7.1.7 Output details
--lang=catalogname

Specify the ‘Language’ field to be used in the header entry. SeeHeader Entry for the meaning of this field. Note: The‘Language-Team’ and ‘Plural-Forms’ fields are left unchanged.If this option is not specified, the ‘Language’ field is inferred, asbest as possible, from the ‘Language-Team’ field.

--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

-i’ ‘ --indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

-F’ ‘ --sort-by-file

Sort output by file location.

7.1.8 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.

-v’ ‘ --verbose

Increase verbosity level.

-q’ ‘ --quiet’ ‘ --silent

Suppress progress indicators.


Next: Manipulating, Previous: Updating, Up: Top   [Contents][Index]

8 Editing PO Files


Next: Gtranslator, Previous: Editing, Up: Editing   [Contents][Index]

8.1 KDE’s PO File Editor


Next: PO Mode, Previous: KBabel, Up: Editing   [Contents][Index]

8.2 GNOME’s PO File Editor


Next: Compendium, Previous: Gtranslator, Up: Editing   [Contents][Index]

8.3 Emacs’s PO File Editor

For those of you beingthe lucky users of Emacs, PO mode has been specifically createdfor providing a cozy environment for editing or modifying PO files.While editing a PO file, PO mode allows for the easy browsing ofauxiliary and compendium PO files, as well as for following references intothe set of C program sources from which PO files have been derived.It has a few special features, among which are the interactive markingof program strings as translatable, and the validation of PO fileswith easy repositioning to PO file lines showing errors.

For the beginning, besides main PO mode commands(see Main PO Commands), you should know how to move between entries(see Entry Positioning), and how to handle untranslated entries(see Untranslated Entries).


Next: Main PO Commands, Previous: PO Mode, Up: PO Mode   [Contents][Index]

8.3.1 Completing GNU gettext Installation

Once you have received, unpacked, configured and compiled the GNUgettext distribution, the ‘make install’ command puts inplace the programs xgettext, msgfmt, gettext, andmsgmerge, as well as their available message catalogs. Totop off a comfortable installation, you might also want to make thePO mode available to your Emacs users.

During the installation of the PO mode, you might want to modify yourfile .emacs, once and for all, so it contains a few lines lookinglike:

(setq auto-mode-alist
      (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist))
(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)

Later, whenever you edit some .pofile, or any file having the string ‘.po.’ within its name,Emacs loads po-mode.elc (or po-mode.el) as needed, andautomatically activates PO mode commands for the associated buffer.The string PO appears in the mode line for any buffer forwhich PO mode is active. Many PO files may be active at once in asingle Emacs session.

If you are using Emacs version 20 or newer, and have already installedthe appropriate international fonts on your system, you may also tellEmacs how to determine automatically the coding system of every PO file.This will often (but not always) cause the necessary fonts to be loadedand used for displaying the translations on your Emacs screen. For thisto happen, add the lines:

(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\."
                            'po-find-file-coding-system)
(autoload 'po-find-file-coding-system "po-mode")

to your .emacs file. If, with this, you still see boxes insteadof international characters, try a different font set (via Shift Mousebutton 1).


8.3.2 Main PO mode Commands

After setting up Emacs with something similar to the lines inInstallation, PO mode is activated for a window when Emacs finds aPO file in that window. This puts the window read-only and establishes apo-mode-map, which is a genuine Emacs mode, in a way that is not derivedfrom text mode in any way. Functions found on po-mode-hook,if any, will be executed.

When PO mode is active in a window, the letters ‘PO’ appearin the mode line for that window. The mode line also displays howmany entries of each kind are held in the PO file. For example,the string ‘132t+3f+10u+2o’ would tell the translator that thePO mode contains 132 translated entries (see Translated Entries,3 fuzzy entries (see Fuzzy Entries), 10 untranslated entries(see Untranslated Entries) and 2 obsolete entries (see Obsolete Entries). Zero-coefficients items are not shown. So, in this example, ifthe fuzzy entries were unfuzzied, the untranslated entries were translatedand the obsolete entries were deleted, the mode line would merely display‘145t’ for the counters.

The main PO commands are those which do not fit into the other categories ofsubsequent sections. These allow for quitting PO mode or for managing windowsin special ways.

_

Undo last modification to the PO file (po-undo).

Q

Quit processing and save the PO file (po-quit).

q

Quit processing, possibly after confirmation (po-confirm-and-quit).

0

Temporary leave the PO file window (po-other-window).

? h

Show help about PO mode (po-help).

=

Give some PO file statistics (po-statistics).

V

Batch validate the format of the whole PO file (po-validate).

The command _ (po-undo) interfaces to the Emacsundo facility. See Undoing Changes in The EmacsEditor. Each time _ is typed, modifications which the translatordid to the PO file are undone a little more. For the purpose ofundoing, each PO mode command is atomic. This is especially true forthe RET command: the whole edition made by using a singleuse of this command is undone at once, even if the edition itselfimplied several actions. However, while in the editing window, onecan undo the edition work quite parsimoniously.

The commands Q (po-quit) and q(po-confirm-and-quit) are used when the translator is done with thePO file. The former is a bit less verbose than the latter. If the filehas been modified, it is saved to disk first. In both cases, and prior toall this, the commands check if any untranslated messages remain in thePO file and, if so, the translator is asked if she really wants to leaveoff working with this PO file. This is the preferred way of getting ridof an Emacs PO file buffer. Merely killing it through the usual commandC-x k (kill-buffer) is not the tidiest way to proceed.

The command 0 (po-other-window) is another, softer way,to leave PO mode, temporarily. It just moves the cursor to some otherEmacs window, and pops one if necessary. For example, if the translatorjust got PO mode to show some source context in some other, she mightdiscover some apparent bug in the program source that needs correction.This command allows the translator to change sex, become a programmer,and have the cursor right into the window containing the program she(or rather he) wants to modify. By later getting the cursor backin the PO file window, or by asking Emacs to edit this file once again,PO mode is then recovered.

The command h (po-help) displays a summary of all available POmode commands. The translator should then type any character to resumenormal PO mode operations. The command ? has the same effectas h.

The command = (po-statistics) computes the total number ofentries in the PO file, the ordinal of the current entry (counted from1), the number of untranslated entries, the number of obsolete entries,and displays all these numbers.

The command V (po-validate) launches msgfmt inchecking and verbosemode over the current PO file. This command first offers to save thecurrent PO file on disk. The msgfmt tool, from GNU gettext,has the purpose of creating a MO file out of a PO file, and PO mode usesthe features of this program for checking the overall format of a PO file,as well as all individual entries.

The program msgfmt runs asynchronously with Emacs, so thetranslator regains control immediately while her PO file is being studied.Error output is collected in the Emacs ‘*compilation*’ buffer,displayed in another window. The regular Emacs command C-x`(next-error), as well as other usual compile commands, allow thetranslator to reposition quickly to the offending parts of the PO file.Once the cursor is on the line in error, the translator may decide onany PO mode action which would help correcting the error.


8.3.3 Entry Positioning

The cursor in a PO file window is almost always part ofan entry. The only exceptions are the special case when the cursoris after the last entry in the file, or when the PO file isempty. The entry where the cursor is found to be is said to be thecurrent entry. Many PO mode commands operate on the current entry,so moving the cursor does more than allowing the translator to browsethe PO file, this also selects on which entry commands operate.

Some PO mode commands alter the position of the cursor in a specializedway. A few of those special purpose positioning are described here,the others are described in following sections (for a complete list tryC-h m):

.

Redisplay the current entry (po-current-entry).

n

Select the entry after the current one (po-next-entry).

p

Select the entry before the current one (po-previous-entry).

<

Select the first entry in the PO file (po-first-entry).

>

Select the last entry in the PO file (po-last-entry).

m

Record the location of the current entry for later use(po-push-location).

r

Return to a previously saved entry location (po-pop-location).

x

Exchange the current entry location with the previously saved one(po-exchange-location).

Any Emacs command able to reposition the cursor may be usedto select the current entry in PO mode, including commands whichmove by characters, lines, paragraphs, screens or pages, and searchcommands. However, there is a kind of standard way to display thecurrent entry in PO mode, which usual Emacs commands movingthe cursor do not especially try to enforce. The command .(po-current-entry) has the sole purpose of redisplaying thecurrent entry properly, after the current entry has been changed bymeans external to PO mode, or the Emacs screen otherwise altered.

It is yet to be decided if PO mode helps the translator, or otherwiseirritates her, by forcing a rigid window disposition while sheis doing her work. We originally had quite precise ideas abouthow windows should behave, but on the other hand, anyone used toEmacs is often happy to keep full control. Maybe a fixed windowdisposition might be offered as a PO mode option that the translatormight activate or deactivate at will, so it could be offered on anexperimental basis. If nobody feels a real need for using it, ora compulsion for writing it, we should drop this whole idea.The incentive for doing it should come from translators rather thanprogrammers, as opinions from an experienced translator are surelymore worth to me than opinions from programmers thinking abouthow others should do translation.

The commands n (po-next-entry) and p(po-previous-entry) move the cursor the entry following,or preceding, the current one. If n is given while thecursor is on the last entry of the PO file, or if pis given while the cursor is on the first entry, no move is done.

The commands < (po-first-entry) and >(po-last-entry) move the cursor to the first entry, or lastentry, of the PO file. When the cursor is located past the lastentry in a PO file, most PO mode commands will return an error saying‘After last entry’. Moreover, the commands < and >have the special property of being able to work even when the cursoris not into some PO file entry, and one may use them for nicelycorrecting this situation. But even these commands will fail on atruly empty PO file. There are development plans for the PO mode for itto interactively fill an empty PO file from sources. See Marking.

The translator may decide, before working at the translation ofa particular entry, that she needs to browse the remainder of thePO file, maybe for finding the terminology or phraseology usedin related entries. She can of course use the standard Emacs idiomsfor saving the current cursor location in some register, and use thatregister for getting back, or else, use the location ring.

PO mode offers another approach, by which cursor locations may be savedonto a special stack. The command m (po-push-location)merely adds the location of current entry to the stack, pushingthe already saved locations under the new one. The commandr (po-pop-location) consumes the top stack element andrepositions the cursor to the entry associated with that top element.This position is then lost, for the next r will move the cursorto the previously saved location, and so on until no locations remainon the stack.

If the translator wants the position to be kept on the location stack,maybe for taking a look at the entry associated with the topelement, then go elsewhere with the intent of getting back later, sheought to use m immediately after r.

The command x (po-exchange-location) simultaneouslyrepositions the cursor to the entry associated with the top element ofthe stack of saved locations, and replaces that top element with thelocation of the current entry before the move. Consequently, repeatingthe x command toggles alternatively between two entries.For achieving this, the translator will position the cursor on thefirst entry, use m, then position to the second entry, andmerely use x for making the switch.


8.3.4 Normalizing Strings in Entries

There are many different ways for encoding a particular string into aPO file entry, because there are so many different ways to split andquote multi-line strings, and even, to represent special charactersby backslashed escaped sequences. Some features of PO mode rely onthe ability for PO mode to scan an already existing PO file for aparticular string encoded into the msgid field of some entry.Even if PO mode has internally all the built-in machinery forimplementing this recognition easily, doing it fast is technicallydifficult. To facilitate a solution to this efficiency problem,we decided on a canonical representation for strings.

A conventional representation of strings in a PO file is currentlyunder discussion, and PO mode experiments with a canonical representation.Having both xgettext and PO mode converging towards a uniformway of representing equivalent strings would be useful, as the internalnormalization needed by PO mode could be automatically satisfiedwhen using xgettext from GNU gettext. An explicitPO mode normalization should then be only necessary for PO filesimported from elsewhere, or for when the convention itself evolves.

So, for achieving normalization of at least the strings of a givenPO file needing a canonical representation, the following PO modecommand is available:

M-x po-normalize

Tidy the whole PO file by making entries more uniform.

The special command M-x po-normalize, which has no associatedkeys, revises all entries, ensuring that strings of both originaland translated entries use uniform internal quoting in the PO file.It also removes any crumb after the last entry. This command may beuseful for PO files freshly imported from elsewhere, or if we everimprove on the canonical quoting format we use. This canonical formatis not only meant for getting cleaner PO files, but also for greatlyspeeding up msgid string lookup for some other PO mode commands.

M-x po-normalize presently makes three passes over the entries.The first implements heuristics for converting PO files for GNUgettext 0.6 and earlier, in which msgid and msgstrfields were using K&R style C string syntax for multi-line strings.These heuristics may fail for comments not related to obsoleteentries and ending with a backslash; they also depend on subsequentpasses for finalizing the proper commenting of continued lines forobsolete entries. This first pass might disappear once all oldish POfiles would have been adjusted. The second and third pass normalizeall msgid and msgstr strings respectively. They alsoclean out those trailing backslashes used by XView’s msgfmtfor continued lines.

Having such an explicit normalizing command allows for importing POfiles from other sources, but also eases the evolution of the currentconvention, evolution driven mostly by aesthetic concerns, as of now.It is easy to make suggested adjustments at a later time, as thenormalizing command and eventually, other GNU gettext toolsshould greatly automate conformance. A description of the canonicalstring format is given below, for the particular benefit of those nothaving Emacs handy, and who would nevertheless want to handcrafttheir PO files in nice ways.

Right now, in PO mode, strings are single line or multi-line. A stringgoes multi-line if and only if it has embedded newlines, thatis, if it matches ‘[^\n]\n+[^\n]’. So, we would have:

msgstr "\n\nHello, world!\n\n\n"

but, replacing the space by a newline, this becomes:

msgstr ""
"\n"
"\n"
"Hello,\n"
"world!\n"
"\n"
"\n"

We are deliberately using a caricatural example, here, to make thepoint clearer. Usually, multi-lines are not that bad looking.It is probable that we will implement the following suggestion.We might lump together all initial newlines into the empty string,and also all newlines introducing empty lines (that is, for n > 1, the n-1’th last newlines would go together on a separatestring), so making the previous example appear:

msgstr "\n\n"
"Hello,\n"
"world!\n"
"\n\n"

There are a few yet undecided little points about string normalization,to be documented in this manual, once these questions settle.


8.3.5 Translated Entries

Each PO file entry for which the msgstr field has been filled witha translation, and which is not marked as fuzzy (see Fuzzy Entries),is said to be a translated entry. Only translated entries willlater be compiled by GNU msgfmt and become usable in programs.Other entry types will be excluded; translation will not occur for them.

Some commands are more specifically related to translated entry processing.

t

Find the next translated entry (po-next-translated-entry).

T

Find the previous translated entry (po-previous-translated-entry).

The commands t (po-next-translated-entry) and T(po-previous-translated-entry) move forwards or backwards, chasingfor an translated entry. If none is found, the search is extended andwraps around in the PO file buffer.

Translated entries usually result from the translator having edited ina translation for them, Modifying Translations. However, if thevariable po-auto-fuzzy-on-edit is not nil, the entry havingreceived a new translation first becomes a fuzzy entry, which ought tobe later unfuzzied before becoming an official, genuine translated entry.See Fuzzy Entries.


8.3.6 Fuzzy Entries

Each PO file entry may have a set of attributes, which arequalities given a name and explicitly associated with the translation,using a special system comment. One of these attributeshas the name fuzzy, and entries having this attribute are saidto have a fuzzy translation. They are called fuzzy entries, for short.

Fuzzy entries, even if they account for translated entries formost other purposes, usually call for revision by the translator.Those may be produced by applying the program msgmerge toupdate an older translated PO files according to a new PO templatefile, when this tool hypothesises that some new msgid hasbeen modified only slightly out of an older one, and chooses to pairwhat it thinks to be the old translation for the new modified entry.The slight alteration in the original string (the msgid string)should often be reflected in the translated string, and this requiresthe intervention of the translator. For this reason, msgmergemight mark some entries as being fuzzy.

Also, the translator may decide herself to mark an entry as fuzzyfor her own convenience, when she wants to remember that the entryhas to be later revisited. So, some commands are more specificallyrelated to fuzzy entry processing.

f

Find the next fuzzy entry (po-next-fuzzy-entry).

F

Find the previous fuzzy entry (po-previous-fuzzy-entry).

TAB

Remove the fuzzy attribute of the current entry (po-unfuzzy).

The commands f (po-next-fuzzy-entry) and F(po-previous-fuzzy-entry) move forwards or backwards, chasing fora fuzzy entry. If none is found, the search is extended and wrapsaround in the PO file buffer.

The command TAB (po-unfuzzy) removes the fuzzyattribute associated with an entry, usually leaving it translated.Further, if the variable po-auto-select-on-unfuzzy has notthe nil value, the TAB command will automatically chasefor another interesting entry to work on. The initial value ofpo-auto-select-on-unfuzzy is nil.

The initial value of po-auto-fuzzy-on-edit is nil. However,if the variable po-auto-fuzzy-on-edit is set to t, any entryedited through the RET command is marked fuzzy, as a way toensure some kind of double check, later. In this case, the usual paradigmis that an entry becomes fuzzy (if not already) whenever the translatormodifies it. If she is satisfied with the translation, she then usesTAB to pick another entry to work on, clearing the fuzzy attributeon the same blow. If she is not satisfied yet, she merely uses SPCto chase another entry, leaving the entry fuzzy.

The translator may also use the DEL command(po-fade-out-entry) over any translated entry to mark it as beingfuzzy, when she wants to easily leave a trace she wants to later returnworking at this entry.

Also, when time comes to quit working on a PO file buffer with the qcommand, the translator is asked for confirmation, if fuzzy stringstill exists.


8.3.7 Untranslated Entries

When xgettext originally creates a PO file, unless toldotherwise, it initializes the msgid field with the untranslatedstring, and leaves the msgstr string to be empty. Such entries,having an empty translation, are said to be untranslated entries.Later, when the programmer slightly modifies some string right inthe program, this change is later reflected in the PO fileby the appearance of a new untranslated entry for the modified string.

The usual commands moving from entry to entry consider untranslatedentries on the same level as active entries. Untranslated entriesare easily recognizable by the fact they end with ‘msgstr ""’.

The work of the translator might be (quite naively) seen as the processof seeking for an untranslated entry, editing a translation forit, and repeating these actions until no untranslated entries remain.Some commands are more specifically related to untranslated entryprocessing.

u

Find the next untranslated entry (po-next-untranslated-entry).

U

Find the previous untranslated entry (po-previous-untransted-entry).

k

Turn the current entry into an untranslated one (po-kill-msgstr).

The commands u (po-next-untranslated-entry) and U(po-previous-untransted-entry) move forwards or backwards,chasing for an untranslated entry. If none is found, the search isextended and wraps around in the PO file buffer.

An entry can be turned back into an untranslated entry bymerely emptying its translation, using the command k(po-kill-msgstr). See Modifying Translations.

Also, when time comes to quit working on a PO file bufferwith the q command, the translator is asked for confirmation,if some untranslated string still exists.


8.3.8 Obsolete Entries

By obsolete PO file entries, we mean those entries which arecommented out, usually by msgmerge when it found that thetranslation is not needed anymore by the package being localized.

The usual commands moving from entry to entry consider obsoleteentries on the same level as active entries. Obsolete entries areeasily recognizable by the fact that all their lines start with#, even those lines containing msgid or msgstr.

Commands exist for emptying the translation or reinitializing itto the original untranslated string. Commands interfacing with thekill ring may force some previously saved text into the translation.The user may interactively edit the translation. All these commandsmay apply to obsolete entries, carefully leaving the entry obsoleteafter the fact.

Moreover, some commands are more specifically related to obsoleteentry processing.

o

Find the next obsolete entry (po-next-obsolete-entry).

O

Find the previous obsolete entry (po-previous-obsolete-entry).

DEL

Make an active entry obsolete, or zap out an obsolete entry(po-fade-out-entry).

The commands o (po-next-obsolete-entry) and O(po-previous-obsolete-entry) move forwards or backwards,chasing for an obsolete entry. If none is found, the search isextended and wraps around in the PO file buffer.

PO mode does not provide ways for un-commenting an obsolete entryand making it active, because this would reintroduce an originaluntranslated string which does not correspond to any marked stringin the program sources. This goes with the philosophy of neverintroducing useless msgid values.

However, it is possible to comment out an active entry, so makingit obsolete. GNU gettext utilities will later react to thedisappearance of a translation by using the untranslated string.The command DEL (po-fade-out-entry) pushes the current entrya little further towards annihilation. If the entry is active (it is atranslated entry), then it is first made fuzzy. If it is already fuzzy,then the entry is merely commented out, with confirmation. If the entryis already obsolete, then it is completely deleted from the PO file.It is easy to recycle the translation so deleted into some other PO fileentry, usually one which is untranslated. See Modifying Translations.

Here is a quite interesting problem to solve for later development ofPO mode, for those nights you are not sleepy. The idea would be thatPO mode might become bright enough, one of these days, to make goodguesses at retrieving the most probable candidate, among all obsoleteentries, for initializing the translation of a newly appeared string.I think it might be a quite hard problem to do this algorithmically, aswe have to develop good and efficient measures of string similarity.Right now, PO mode completely lets the decision to the translator,when the time comes to find the adequate obsolete translation, itmerely tries to provide handy tools for helping her to do so.


8.3.9 Modifying Translations

PO mode prevents direct modification of the PO file, by the usualmeans Emacs gives for altering a buffer’s contents. By doing so,it pretends helping the translator to avoid little clerical errorsabout the overall file format, or the proper quoting of strings,as those errors would be easily made. Other kinds of errors arestill possible, but some may be caught and diagnosed by the batchvalidation process, which the translator may always trigger by theV command. For all other errors, the translator has to rely onher own judgment, and also on the linguistic reports submitted to herby the users of the translated package, having the same mother tongue.

When the time comes to create a translation, correct an error diagnosedmechanically or reported by a user, the translators have to resort tousing the following commands for modifying the translations.

RET

Interactively edit the translation (po-edit-msgstr).

LFD C-j

Reinitialize the translation with the original, untranslated string(po-msgid-to-msgstr).

k

Save the translation on the kill ring, and delete it (po-kill-msgstr).

w

Save the translation on the kill ring, without deleting it(po-kill-ring-save-msgstr).

y

Replace the translation, taking the new from the kill ring(po-yank-msgstr).

The command RET (po-edit-msgstr) opens a new Emacswindow meant to edit in a new translation, or to modify an already existingtranslation. The new window contains a copy of the translation taken fromthe current PO file entry, all ready for edition, expunged of all quotingmarks, fully modifiable and with the complete extent of Emacs modifyingcommands. When the translator is done with her modifications, she may useC-c C-c to close the subedit window with the automatically requotedresults, or C-c C-k to abort her modifications. See Subedit,for more information.

The command LFD (po-msgid-to-msgstr) initializes, orreinitializes the translation with the original string. This command isnormally used when the translator wants to redo a fresh translation ofthe original string, disregarding any previous work.

It is possible to arrange so, whenever editing an untranslatedentry, the LFD command be automatically executed. If you setpo-auto-edit-with-msgid to t, the translation getsinitialised with the original string, in case none exists already.The default value for po-auto-edit-with-msgid is nil.

In fact, whether it is best to start a translation with an emptystring, or rather with a copy of the original string, is a matter oftaste or habit. Sometimes, the source language and thetarget language are so different that is simply best to start writingon an empty page. At other times, the source and target languagesare so close that it would be a waste to retype a number of wordsalready being written in the original string. A translator may alsolike having the original string right under her eyes, as she willprogressively overwrite the original text with the translation, evenif this requires some extra editing work to get rid of the original.

The command k (po-kill-msgstr) merely empties thetranslation string, so turning the entry into an untranslatedone. But while doing so, its previous contents is put apart ina special place, known as the kill ring. The command w(po-kill-ring-save-msgstr) has also the effect of taking acopy of the translation onto the kill ring, but it otherwise leavesthe entry alone, and does not remove the translation from theentry. Both commands use exactly the Emacs kill ring, which is sharedbetween buffers, and which is well known already to Emacs lovers.

The translator may use k or w many times in the courseof her work, as the kill ring may hold several saved translations.From the kill ring, strings may later be reinserted in variousEmacs buffers. In particular, the kill ring may be used for movingtranslation strings between different entries of a single PO filebuffer, or if the translator is handling many such buffers at once,even between PO files.

To facilitate exchanges with buffers which are not in PO mode, thetranslation string put on the kill ring by the k command is fullyunquoted before being saved: external quotes are removed, multi-linestrings are concatenated, and backslash escaped sequences are turnedinto their corresponding characters. In the special case of obsoleteentries, the translation is also uncommented prior to saving.

The command y (po-yank-msgstr) completely replaces thetranslation of the current entry by a string taken from the kill ring.Following Emacs terminology, we then say that the replacementstring is yanked into the PO file buffer.See Yanking in The Emacs Editor.The first time y is used, the translation receives the value ofthe most recent addition to the kill ring. If y is typed onceagain, immediately, without intervening keystrokes, the translationjust inserted is taken away and replaced by the second most recentaddition to the kill ring. By repeating y many times in a row,the translator may travel along the kill ring for saved strings,until she finds the string she really wanted.

When a string is yanked into a PO file entry, it is fully andautomatically requoted for complying with the format PO files shouldhave. Further, if the entry is obsolete, PO mode then appropriatelypush the inserted string inside comments. Once again, translatorsshould not burden themselves with quoting considerations besides, ofcourse, the necessity of the translated string itself respective tothe program using it.

Note that k or w are not the only commands pushing stringson the kill ring, as almost any PO mode command replacing translationstrings (or the translator comments) automatically saves the old stringon the kill ring. The main exceptions to this general rule are theyanking commands themselves.

To better illustrate the operation of killing and yanking, let’suse an actual example, taken from a common situation. When theprogrammer slightly modifies some string right in the program, hischange is later reflected in the PO file by the appearanceof a new untranslated entry for the modified string, and the factthat the entry translating the original or unmodified string becomesobsolete. In many cases, the translator might spare herself some workby retrieving the unmodified translation from the obsolete entry,then initializing the untranslated entry msgstr field withthis retrieved translation. Once this done, the obsolete entry isnot wanted anymore, and may be safely deleted.

When the translator finds an untranslated entry and suspects that aslight variant of the translation exists, she immediately uses mto mark the current entry location, then starts chasing obsoleteentries with o, hoping to find some translation correspondingto the unmodified string. Once found, she uses the DEL commandfor deleting the obsolete entry, knowing that DEL also killsthe translation, that is, pushes the translation on the kill ring.Then, r returns to the initial untranslated entry, and ythen yanks the saved translation right into the msgstrfield. The translator is then free to use RET for finetuning the translation contents, and maybe to later use u,then m again, for going on with the next untranslated string.

When some sequence of keys has to be typed over and over again, thetranslator may find it useful to become better acquainted with the Emacscapability of learning these sequences and playing them back under request.See Keyboard Macros in The Emacs Editor.


8.3.10 Modifying Comments

Any translation work done seriously will raise many linguisticdifficulties, for which decisions have to be made, and the choicesfurther documented. These documents may be saved within thePO file in form of translator comments, which the translatoris free to create, delete, or modify at will. These comments maybe useful to herself when she returns to this PO file after a while.

Comments not having whitespace after the initial ‘#’, for example,those beginning with ‘#.’ or ‘#:’, are not translatorcomments, they are exclusively created by other gettext tools.So, the commands below will never alter such system added comments,they are not meant for the translator to modify. See PO Files.

The following commands are somewhat similar to those modifying translations,so the general indications given for those apply here. See Modifying Translations.

#

Interactively edit the translator comments (po-edit-comment).

K

Save the translator comments on the kill ring, and delete it(po-kill-comment).

W

Save the translator comments on the kill ring, without deleting it(po-kill-ring-save-comment).

Y

Replace the translator comments, taking the new from the kill ring(po-yank-comment).

These commands parallel PO mode commands for modifying the translationstrings, and behave much the same way as they do, except that they handlethis part of PO file comments meant for translator usage, ratherthan the translation strings. So, if the descriptions given below areslightly succinct, it is because the full details have already been given.See Modifying Translations.

The command # (po-edit-comment) opens a new Emacs windowcontaining a copy of the translator comments on the current PO file entry.If there are no such comments, PO mode understands that the translator wantsto add a comment to the entry, and she is presented with an empty screen.Comment marks (#) and the space following them are automaticallyremoved before edition, and reinstated after. For translator commentspertaining to obsolete entries, the uncommenting and recommenting operationsare done twice. Once in the editing window, the keys C-c C-callow the translator to tell she is finished with editing the comment.See Subedit, for further details.

Functions found on po-subedit-mode-hook, if any, are executed afterthe string has been inserted in the edit buffer.

The command K (po-kill-comment) gets rid of alltranslator comments, while saving those comments on the kill ring.The command W (po-kill-ring-save-comment) takesa copy of the translator comments on the kill ring, but leavesthem undisturbed in the current entry. The command Y(po-yank-comment) completely replaces the translator commentsby a string taken at the front of the kill ring. When this commandis immediately repeated, the comments just inserted are withdrawn,and replaced by other strings taken along the kill ring.

On the kill ring, all strings have the same nature. There is nodistinction between translation strings and translatorcomments strings. So, for example, let’s presume the translatorhas just finished editing a translation, and wants to create a newtranslator comment to document why the previous translation wasnot good, just to remember what was the problem. Foreseeing that shewill do that in her documentation, the translator may want to quotethe previous translation in her translator comments. To do so, shemay initialize the translator comments with the previous translation,still at the head of the kill ring. Because editing already pushed theprevious translation on the kill ring, she merely has to type M-wprior to #, and the previous translation will be right there,all ready for being introduced by some explanatory text.

On the other hand, presume there are some translator comments alreadyand that the translator wants to add to those comments, insteadof wholly replacing them. Then, she should edit the comment rightaway with #. Once inside the editing window, she can use theregular Emacs commands C-y (yank) and M-y(yank-pop) to get the previous translation where she likes.


8.3.11 Details of Sub Edition

The PO subedit minor mode has a few peculiarities worth being describedin fuller detail. It installs a few commands over the usual editing setof Emacs, which are described below.

C-c C-c

Complete edition (po-subedit-exit).

C-c C-k

Abort edition (po-subedit-abort).

C-c C-a

Consult auxiliary PO files (po-subedit-cycle-auxiliary).

The window’s contents represents a translation for a given message,or a translator comment. The translator may modify this window toher heart’s content. Once this is done, the command C-c C-c(po-subedit-exit) may be used to return the edited translation intothe PO file, replacing the original translation, even if it moved out ofsight or if buffers were switched.

If the translator becomes unsatisfied with her translation or comment,to the extent she prefers keeping what was existent prior to theRET or # command, she may use the command C-c C-k(po-subedit-abort) to merely get rid of edition, while preservingthe original translation or comment. Another way would be for her to exitnormally with C-c C-c, then type U once for undoing thewhole effect of last edition.

The command C-c C-a (po-subedit-cycle-auxiliary)allows for glancing through translationsalready achieved in other languages, directly while editing the currenttranslation. This may be quite convenient when the translator is fluentat many languages, but of course, only makes sense when such completedauxiliary PO files are already available to her (see Auxiliary).

Functions found on po-subedit-mode-hook, if any, are executed afterthe string has been inserted in the edit buffer.

While editing her translation, the translator should pay attention to notinserting unwanted RET (newline) characters at the end ofthe translated string if those are not meant to be there, or to removingsuch characters when they are required. Since these characters are notvisible in the editing buffer, they are easily introduced by mistake.To help her, RET automatically puts the character <at the end of the string being edited, but this < is not reallypart of the string. On exiting the editing window with C-c C-c,PO mode automatically removes such < and all whitespace added afterit. If the translator adds characters after the terminating <, itlooses its delimiting property and integrally becomes part of the string.If she removes the delimiting <, then the edited string is takenas is, with all trailing newlines, even if invisible. Also, ifthe translated string ought to end itself with a genuine <, thenthe delimiting < may not be removed; so the string should appear,in the editing window, as ending with two < in a row.

When a translation (or a comment) is being edited, the translator may movethe cursor back into the PO file buffer and freely move to other entries,browsing at will. If, with an edition pending, the translator wanders in thePO file buffer, she may decide to start modifying another entry. Each entrybeing edited has its own subedit buffer. It is possible to simultaneouslyedit the translation and the comment of a single entry, or toedit entries in different PO files, all at once. Typing RETon a field already being edited merely resumes that particular edit. Yet,the translator should better be comfortable at handling many Emacs windows!

Pending subedits may be completed or aborted in any order, regardlessof how or when they were started. When many subedits are pending and thetranslator asks for quitting the PO file (with the q command), subeditsare automatically resumed one at a time, so she may decide for each of them.


Next: Auxiliary, Previous: Subedit, Up: PO Mode   [Contents][Index]

8.3.12 C Sources Context

PO mode is particularly powerful when used with PO filescreated through GNU gettext utilities, as those utilitiesinsert special comments in the PO files they generate.Some of these special comments relate the PO file entry toexactly where the untranslated string appears in the program sources.

When the translator gets to an untranslated entry, she is fairlyoften faced with an original string which is not as informative asit normally should be, being succinct, cryptic, or otherwise ambiguous.Before choosing how to translate the string, she needs to understandbetter what the string really means and how tight the translation hasto be. Most of the time, when problems arise, the only way left to makeher judgment is looking at the true program sources from where thisstring originated, searching for surrounding comments the programmermight have put in there, and looking around for helping clues ofany kind.

Surely, when looking at program sources, the translator will receivemore help if she is a fluent programmer. However, even if she isnot versed in programming and feels a little lost in C code, thetranslator should not be shy at taking a look, once in a while.It is most probable that she will still be able to find some of thehints she needs. She will learn quickly to not feel uncomfortablein program code, paying more attention to programmer’s comments,variable and function names (if he dared choosing them well), andoverall organization, than to the program code itself.

The following commands are meant to help the translator at gettingprogram source context for a PO file entry.

s

Resume the display of a program source context, or cycle through them(po-cycle-source-reference).

M-s

Display of a program source context selected by menu(po-select-source-reference).

S

Add a directory to the search path for source files(po-consider-source-path).

M-S

Delete a directory from the search path for source files(po-ignore-source-path).

The commands s (po-cycle-source-reference) and M-s(po-select-source-reference) both open another window displayingsome source program file, and already positioned in such a way thatit shows an actual use of the string to be translated. By doingso, the command gives source program context for the string. But ifthe entry has no source context references, or if all referencesare unresolved along the search path for program sources, then thecommand diagnoses this as an error.

Even if s (or M-s) opens a new window, the cursor staysin the PO file window. If the translator really wants toget into the program source window, she ought to do it explicitly,maybe by using command O.

When s is typed for the first time, or for a PO file entry whichis different of the last one used for getting source context, then thecommand reacts by giving the first context available for this entry,if any. If some context has already been recently displayed for thecurrent PO file entry, and the translator wandered off to do otherthings, typing s again will merely resume, in another window,the context last displayed. In particular, if the translator movedthe cursor away from the context in the source file, the command willbring the cursor back to the context. By using s many timesin a row, with no other commands intervening, PO mode will cycle tothe next available contexts for this particular entry, getting backto the first context once the last has been shown.

The command M-s behaves differently. Instead of cycling throughreferences, it lets the translator choose a particular reference amongmany, and displays that reference. It is best used with completion,if the translator types TAB immediately after M-s, inresponse to the question, she will be offered a menu of all possiblereferences, as a reminder of which are the acceptable answers.This command is useful only where there are really many contextsavailable for a single string to translate.

Program source files are usually found relative to where the POfile stands. As a special provision, when this fails, the file isalso looked for, but relative to the directory immediately above it.Those two cases take proper care of most PO files. However, it mighthappen that a PO file has been moved, or is edited in a differentplace than its normal location. When this happens, the translatorshould tell PO mode in which directory normally sits the genuine POfile. Many such directories may be specified, and all together, theyconstitute what is called the search path for program sources.The command S (po-consider-source-path) is used to interactivelyenter a new directory at the front of the search path, and the commandM-S (po-ignore-source-path) is used to select, with completion,one of the directories she does not want anymore on the search path.


8.3.13 Consulting Auxiliary PO Files

PO mode is able to help the knowledgeable translator, being fluent inmany languages, at taking advantage of translations already achievedin other languages she just happens to know. It provides these otherlanguage translations as additional context for her own work. Moreover,it has features to ease the production of translations for many languagesat once, for translators preferring to work in this way.

An auxiliary PO file is an existing PO file meant for the samepackage the translator is working on, but targeted to a different mothertongue language. Commands exist for declaring and handling auxiliaryPO files, and also for showing contexts for the entry under work.

Here are the auxiliary file commands available in PO mode.

a

Seek auxiliary files for another translation for the same entry(po-cycle-auxiliary).

C-c C-a

Switch to a particular auxiliary file (po-select-auxiliary).

A

Declare this PO file as an auxiliary file (po-consider-as-auxiliary).

M-A

Remove this PO file from the list of auxiliary files(po-ignore-as-auxiliary).

Command A (po-consider-as-auxiliary) adds the currentPO file to the list of auxiliary files, while command M-A(po-ignore-as-auxiliary just removes it.

The command a (po-cycle-auxiliary) seeks all auxiliary POfiles, round-robin, searching for a translated entry in some other languagehaving an msgid field identical as the one for the current entry.The found PO file, if any, takes the place of the current PO file inthe display (its window gets on top). Before doing so, the current POfile is also made into an auxiliary file, if not already. So, ain this newly displayed PO file will seek another PO file, and so on,so repeating a will eventually yield back the original PO file.

The command C-c C-a (po-select-auxiliary) asks the translatorfor her choice of a particular auxiliary file, with completion, andthen switches to that selected PO file. The command also checks ifthe selected file has an msgid field identical as the one forthe current entry, and if yes, this entry becomes current. Otherwise,the cursor of the selected file is left undisturbed.

For all this to work fully, auxiliary PO files will have to be normalized,in that way that msgid fields should be written exactlythe same way. It is possible to write msgid fields in variousways for representing the same string, different writing would break theproper behaviour of the auxiliary file commands of PO mode. This is notexpected to be much a problem in practice, as most existing PO files havetheir msgid entries written by the same GNU gettext tools.

However, PO files initially created by PO mode itself, while markingstrings in source files, are normalised differently. So are POfiles resulting of the ‘M-x normalize’ command. Until thesediscrepancies between PO mode and other GNU gettext tools getfully resolved, the translator should stay aware of normalisation issues.


Previous: PO Mode, Up: Editing   [Contents][Index]

8.4 Using Translation Compendia

A compendium is a special PO file containing a set oftranslations recurring in many different packages. The translator canuse gettext tools to build a new compendium, to add entries to hercompendium, and to initialize untranslated entries, or to updatealready translated entries, from translations kept in the compendium.


8.4.1 Creating Compendia

Basically every PO file consisting of translated entries only can bedeclared as a valid compendium. Often the translator wants to havespecial compendia; let’s consider two cases: concatenating POfiles and extracting a message subset from a PO file.

8.4.1.1 Concatenate PO Files

To concatenate several valid PO files into one compendium file you canuse ‘msgcomm’ or ‘msgcat’ (the latter preferred):

msgcat -o compendium.po file1.po file2.po

By default, msgcat will accumulate divergent translationsfor the same string. Those occurrences will be marked as fuzzyand highly visible decorated; calling msgcat onfile1.po:

#: src/hello.c:200
#, c-format
msgid "Report bugs to <%s>.\n"
msgstr "Comunicar `bugs' a <%s>.\n"

and file2.po:

#: src/bye.c:100
#, c-format
msgid "Report bugs to <%s>.\n"
msgstr "Comunicar \"bugs\" a <%s>.\n"

will result in:

#: src/hello.c:200 src/bye.c:100
#, fuzzy, c-format
msgid "Report bugs to <%s>.\n"
msgstr ""
"#-#-#-#-#  file1.po  #-#-#-#-#\n"
"Comunicar `bugs' a <%s>.\n"
"#-#-#-#-#  file2.po  #-#-#-#-#\n"
"Comunicar \"bugs\" a <%s>.\n"

The translator will have to resolve this “conflict” manually; shehas to decide whether the first or the second version is appropriate(or provide a new translation), to delete the “marker lines”, andfinally to remove the fuzzy mark.

If the translator knows in advance the first found translation of amessage is always the best translation she can make use to the‘--use-first’ switch:

msgcat --use-first -o compendium.po file1.po file2.po

A good compendium file must not contain fuzzy or untranslatedentries. If input files are “dirty” you must preprocess the inputfiles or postprocess the result using ‘msgattrib --translated --no-fuzzy’.

8.4.1.2 Extract a Message Subset from a PO File

Nobody wants to translate the same messages again and again; thus youmay wish to have a compendium file containing getopt.c messages.

To extract a message subset (e.g., all getopt.c messages) from anexisting PO file into one compendium file you can use ‘msggrep’:

msggrep --location src/getopt.c -o compendium.po file.po

8.4.2 Using Compendia

You can use a compendium file to initialize a translation from scratchor to update an already existing translation.

8.4.2.1 Initialize a New Translation File

Since a PO file with translations does not exist the translator canmerely use /dev/null to fake the “old” translation file.

msgmerge --compendium compendium.po -o file.po /dev/null file.pot
8.4.2.2 Update an Existing Translation File

Concatenate the compendium file(s) and the existing PO, merge theresult with the POT file and remove the obsolete entries (optional,here done using ‘msgattrib’):

msgcat --use-first -o update.po compendium1.po compendium2.po file.po
msgmerge update.po file.pot | msgattrib --no-obsolete > file.po

Next: Binaries, Previous: Editing, Up: Top   [Contents][Index]

9 Manipulating PO Files

Sometimes it is necessary to manipulate PO files in a way that is betterperformed automatically than by hand. GNU gettext includes acomplete set of tools for this purpose.

When merging two packages into a single package, the resulting POT filewill be the concatenation of the two packages’ POT files. Thus themaintainer must concatenate the two existing package translations intoa single translation catalog, for each language. This is best performedusing ‘msgcat’. It is then the translators’ duty to deal with anypossible conflicts that arose during the merge.

When a translator takes over the translation job from another translator,but she uses a different character encoding in her locale, she willconvert the catalog to her character encoding. This is best done throughthe ‘msgconv’ program.

When a maintainer takes a source file with tagged messages from anotherpackage, he should also take the existing translations for this sourcefile (and not let the translators do the same job twice). One way to dothis is through ‘msggrep’, another is to create a POT file forthat source file and use ‘msgmerge’.

When a translator wants to adjust some translation catalog for a specialdialect or orthography — for example, German as written in Switzerlandversus German as written in Germany — she needs to apply some textprocessing to every message in the catalog. The tool for doing this is‘msgfilter’.

Another use of msgfilter is to produce approximately the POT file forwhich a given PO file was made. This can be done through a filter commandlike ‘msgfilter sed -e d | sed -e '/^# /d'’. Note that the originalPOT file may have had different comments and different plural message counts,that’s why it’s better to use the original POT file if available.

When a translator wants to check her translations, for example accordingto orthography rules or using a non-interactive spell checker, she can doso using the ‘msgexec’ program.

When third party tools create PO or POT files, sometimes duplicates cannotbe avoided. But the GNU gettext tools give an error when theyencounter duplicate msgids in the same file and in the same domain.To merge duplicates, the ‘msguniq’ program can be used.

msgcomm’ is a more general tool for keeping or throwing awayduplicates, occurring in different files.

msgcmp’ can be used to check whether a translation catalog iscompletely translated.

msgattrib’ can be used to select and extract only the fuzzyor untranslated messages of a translation catalog.

msgen’ is useful as a first step for preparing English translationcatalogs. It copies each message’s msgid to its msgstr.

Finally, for those applications where all these various programs are notsufficient, a library ‘libgettextpo’ is provided that can be used towrite other specialized programs that process PO files.


9.1 Invoking the msgcat Program

msgcat [option] [inputfile]...

The msgcat program concatenates and merges the specified PO files.It finds messages which are common to two or more of the specified PO files.By using the --more-than option, greater commonality may be requestedbefore messages are printed. Conversely, the --less-than option may beused to specify less commonality before messages are printed (i.e.‘--less-than=2’ will only print the unique messages). Translations,comments, extracted comments, and file positions will be cumulated, except thatif --use-first is specified, they will be taken from the first PO fileto define them.

9.1.1 Input file location
inputfile

Input files.

-f file’ ‘ --files-from=file

Read the names of the input files from file instead of gettingthem from the command line.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If inputfile is ‘-’, standard input is read.

9.1.2 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

9.1.3 Message selection
-< number’ ‘ --less-than=number

Print messages with less than number definitions, defaults to infiniteif not set.

-> number’ ‘ --more-than=number

Print messages with more than number definitions, defaults to 0 if notset.

-u’ ‘ --unique

Shorthand for ‘--less-than=2’. Requests that only unique messages beprinted.

9.1.4 Input file syntax
-P’ ‘ --properties-input

Assume the input files are Java ResourceBundles in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input files are NeXTstep/GNUstep localized resource files in.strings syntax, not in PO file syntax.

9.1.5 Output details
-t’ ‘ --to-code=name

Specify encoding for output.

--use-first

Use first available translation for each message. Don’t merge severaltranslations into one.

--lang=catalogname

Specify the ‘Language’ field to be used in the header entry. SeeHeader Entry for the meaning of this field. Note: The‘Language-Team’ and ‘Plural-Forms’ fields are left unchanged.

--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

-i’ ‘ --indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

-F’ ‘ --sort-by-file

Sort output by file location.

9.1.6 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


9.2 Invoking the msgconv Program

msgconv [option] [inputfile]

The msgconv program converts a translation catalog to a differentcharacter encoding.

9.2.1 Input file location
inputfile

Input PO file.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If no inputfile is given or if it is ‘-’, standard input is read.

9.2.2 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

9.2.3 Conversion target
-t’ ‘ --to-code=name

Specify encoding for output.

The default encoding is the current locale’s encoding.

9.2.4 Input file syntax
-P’ ‘ --properties-input

Assume the input file is a Java ResourceBundle in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input file is a NeXTstep/GNUstep localized resource file in.strings syntax, not in PO file syntax.

9.2.5 Output details
--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

-i’ ‘ --indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

-F’ ‘ --sort-by-file

Sort output by file location.

9.2.6 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


9.3 Invoking the msggrep Program

msggrep [option] [inputfile]

The msggrep program extracts all messages of a translation catalogthat match a given pattern or belong to some given source files.

9.3.1 Input file location
inputfile

Input PO file.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If no inputfile is given or if it is ‘-’, standard input is read.

9.3.2 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

9.3.3 Message selection
  [-N sourcefile]... [-M domainname]...
  [-J msgctxt-pattern] [-K msgid-pattern] [-T msgstr-pattern]
  [-C comment-pattern]

A message is selected if

  • it comes from one of the specified source files,
  • or if it comes from one of the specified domains,
  • or if ‘-J’ is given and its context (msgctxt) matchesmsgctxt-pattern,
  • or if ‘-K’ is given and its key (msgid or msgid_plural) matchesmsgid-pattern,
  • or if ‘-T’ is given and its translation (msgstr) matchesmsgstr-pattern,
  • or if ‘-C’ is given and the translator’s comment matchescomment-pattern.

When more than one selection criterion is specified, the set of selectedmessages is the union of the selected messages of each criterion.

msgctxt-pattern or msgid-pattern or msgstr-pattern syntax:

  [-E | -F] [-e pattern | -f file]...

patterns are basic regular expressions by default, or extended regularexpressions if -E is given, or fixed strings if -F is given.

-N sourcefile’ ‘ --location=sourcefile

Select messages extracted from sourcefile. sourcefile can beeither a literal file name or a wildcard pattern.

-M domainname’ ‘ --domain=domainname

Select messages belonging to domain domainname.

-J’ ‘ --msgctxt

Start of patterns for the msgctxt.

-K’ ‘ --msgid

Start of patterns for the msgid.

-T’ ‘ --msgstr

Start of patterns for the msgstr.

-C’ ‘ --comment

Start of patterns for the translator’s comment.

-X’ ‘ --extracted-comment

Start of patterns for the extracted comments.

-E’ ‘ --extended-regexp

Specify that pattern is an extended regular expression.

-F’ ‘ --fixed-strings

Specify that pattern is a set of newline-separated strings.

-e pattern’ ‘ --regexp=pattern

Use pattern as a regular expression.

-f file’ ‘ --file=file

Obtain pattern from file.

-i’ ‘ --ignore-case

Ignore case distinctions.

-v’ ‘ --invert-match

Output only the messages that do not match any selection criterion, insteadof the messages that match a selection criterion.

9.3.4 Input file syntax
-P’ ‘ --properties-input

Assume the input file is a Java ResourceBundle in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input file is a NeXTstep/GNUstep localized resource file in.strings syntax, not in PO file syntax.

9.3.5 Output details
--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

--indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

--sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

--sort-by-file

Sort output by file location.

9.3.6 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.

9.3.7 Examples

To extract the messages that come from the source filesgnulib-lib/error.c and gnulib-lib/getopt.c:

msggrep -N gnulib-lib/error.c -N gnulib-lib/getopt.c input.po

To extract the messages that contain the string “Please specify” in theoriginal string:

msggrep --msgid -F -e 'Please specify' input.po

To extract the messages that have a context specifier of either “Menu>File”or “Menu>Edit” or a submenu of them:

msggrep --msgctxt -E -e '^Menu>(File|Edit)' input.po

To extract the messages whose translation contains one of the strings in thefile wordlist.txt:

msggrep --msgstr -F -f wordlist.txt input.po

9.4 Invoking the msgfilter Program

msgfilter [option] filter [filter-option]

The msgfilter program applies a filter to all translations of atranslation catalog.

During each filter invocation, the environment variableMSGFILTER_MSGID is bound to the message’s msgid, and the environmentvariable MSGFILTER_LOCATION is bound to the location in the PO fileof the message. If the message has a context, the environment variableMSGFILTER_MSGCTXT is bound to the message’s msgctxt, otherwise it isunbound. If the message has a plural form, environment variableMSGFILTER_MSGID_PLURAL is bound to the message’s msgid_plural andMSGFILTER_PLURAL_FORM is bound to the order number of the pluralactually processed (starting with 0), otherwise both are unbound.If the message has a previous msgid (added by msgmerge),environment variable MSGFILTER_PREV_MSGCTXT is bound to themessage’s previous msgctxt, MSGFILTER_PREV_MSGID is bound tothe previous msgid, and MSGFILTER_PREV_MSGID_PLURAL is bound tothe previous msgid_plural.

9.4.1 Input file location
-i inputfile’ ‘ --input=inputfile

Input PO file.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If no inputfile is given or if it is ‘-’, standard input is read.

9.4.2 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

9.4.3 The filter

The filter can be any program that reads a translation from standardinput and writes a modified translation to standard output. A frequentlyused filter is ‘sed’. A few particular built-in filters are alsorecognized.

--newline

Add newline at the end of each input line and also strip the endingnewline from the output line.

Note: If the filter is not a built-in filter, you have to care about encodings:It is your responsibility to ensure that the filter can copewith input encoded in the translation catalog’s encoding. If thefilter wants input in a particular encoding, you can in a first stepconvert the translation catalog to that encoding using the ‘msgconv’program, before invoking ‘msgfilter’. If the filter wants inputin the locale’s encoding, but you want to avoid the locale’s encoding, thenyou can first convert the translation catalog to UTF-8 using the‘msgconv’ program and then make ‘msgfilter’ work in an UTF-8locale, by using the LC_ALL environment variable.

Note: Most translations in a translation catalog don’t end with anewline character. For this reason, unless the --newlineoption is used, it is important that the filter recognizes itslast input line even if it ends without a newline, and that it doesn’tadd an undesired trailing newline at the end. The ‘sed’ program onsome platforms is known to ignore the last line of input if it is notterminated with a newline. You can use GNU sed instead; it doesnot have this limitation.

9.4.4 Useful filter-options when the filter is ‘sed
-e script’ ‘ --expression=script

Add script to the commands to be executed.

-f scriptfile’ ‘ --file=scriptfile

Add the contents of scriptfile to the commands to be executed.

-n’ ‘ --quiet’ ‘ --silent

Suppress automatic printing of pattern space.

9.4.5 Built-in filters

The filter ‘recode-sr-latin’ is recognized as a built-in filter.The command ‘recode-sr-latin’ converts Serbian text, written in theCyrillic script, to the Latin script.The command ‘msgfilter recode-sr-latin’ applies this conversion to thetranslations of a PO file. Thus, it can be used to convert an sr.pofile to an [email protected] file.

The filter ‘quot’ is recognized as a built-in filter.The command ‘msgfilter quot’ converts any quotations surroundedby a pair of ‘"’, ‘'’, and ‘`’.

The filter ‘boldquot’ is recognized as a built-in filter.The command ‘msgfilter boldquot’ converts any quotationssurrounded by a pair of ‘"’, ‘'’, and ‘`’, also adding theVT100 escape sequences to the text to decorate it as bold.

The use of built-in filters is not sensitive to the current locale’s encoding.Moreover, when used with a built-in filter, ‘msgfilter’ can automaticallyconvert the message catalog to the UTF-8 encoding when needed.

9.4.6 Input file syntax
-P’ ‘ --properties-input

Assume the input file is a Java ResourceBundle in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input file is a NeXTstep/GNUstep localized resource file in.strings syntax, not in PO file syntax.

9.4.7 Output details
--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

--indent

Write the .po file using indented style.

--keep-header

Keep the header entry, i.e. the message with ‘msgid ""’, unmodified,instead of filtering it. By default, the header entry is subject tofiltering like any other message.

--no-location

Do not write ‘#: filename:line’ lines.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

-F’ ‘ --sort-by-file

Sort output by file location.

9.4.8 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.

9.4.9 Examples

To convert German translations to Swiss orthography (in an UTF-8 locale):

msgconv -t UTF-8 de.po | msgfilter sed -e 's/ß/ss/g'

To convert Serbian translations in Cyrillic script to Latin script:

msgfilter recode-sr-latin < sr.po

9.5 Invoking the msguniq Program

msguniq [option] [inputfile]

The msguniq program unifies duplicate translations in a translationcatalog. It finds duplicate translations of the same message ID. Suchduplicates are invalid input for other programs like msgfmt,msgmerge or msgcat. By default, duplicates are mergedtogether. When using the ‘--repeated’ option, only duplicates areoutput, and all other messages are discarded. Comments and extractedcomments will be cumulated, except that if ‘--use-first’ isspecified, they will be taken from the first translation. File positionswill be cumulated. When using the ‘--unique’ option, duplicates arediscarded.

9.5.1 Input file location
inputfile

Input PO file.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If no inputfile is given or if it is ‘-’, standard input is read.

9.5.2 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

9.5.3 Message selection
-d’ ‘ --repeated

Print only duplicates.

-u’ ‘ --unique

Print only unique messages, discard duplicates.

9.5.4 Input file syntax
-P’ ‘ --properties-input

Assume the input file is a Java ResourceBundle in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input file is a NeXTstep/GNUstep localized resource file in.strings syntax, not in PO file syntax.

9.5.5 Output details
-t’ ‘ --to-code=name

Specify encoding for output.

--use-first

Use first available translation for each message. Don’t merge severaltranslations into one.

--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

-i’ ‘ --indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

-F’ ‘ --sort-by-file

Sort output by file location.

9.5.6 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


9.6 Invoking the msgcomm Program

msgcomm [option] [inputfile]...

The msgcomm program finds messages which are common to two or moreof the specified PO files.By using the --more-than option, greater commonality may be requestedbefore messages are printed. Conversely, the --less-than option may beused to specify less commonality before messages are printed (i.e.‘--less-than=2’ will only print the unique messages). Translations,comments and extracted comments will be preserved, but only from the firstPO file to define them. File positions from all PO files will becumulated.

9.6.1 Input file location
inputfile

Input files.

-f file’ ‘ --files-from=file

Read the names of the input files from file instead of gettingthem from the command line.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If inputfile is ‘-’, standard input is read.

9.6.2 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

9.6.3 Message selection
-< number’ ‘ --less-than=number

Print messages with less than number definitions, defaults to infiniteif not set.

-> number’ ‘ --more-than=number

Print messages with more than number definitions, defaults to 1 if notset.

-u’ ‘ --unique

Shorthand for ‘--less-than=2’. Requests that only unique messages beprinted.

9.6.4 Input file syntax
-P’ ‘ --properties-input

Assume the input files are Java ResourceBundles in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input files are NeXTstep/GNUstep localized resource files in.strings syntax, not in PO file syntax.

9.6.5 Output details
--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

-i’ ‘ --indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

-F’ ‘ --sort-by-file

Sort output by file location.

--omit-header

Don’t write header with ‘msgid ""’ entry.

9.6.6 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


9.7 Invoking the msgcmp Program

msgcmp [option] def.po ref.pot

The msgcmp program compares two Uniforum style .po files to check thatboth contain the same set of msgid strings. The def.po file is anexisting PO file with the translations. The ref.pot file is the lastcreated PO file, or a PO Template file (generally created by xgettext).This is useful for checking that you have translated each and every messagein your program. Where an exact match cannot be found, fuzzy matching isused to produce better diagnostics.

9.7.1 Input file location
def.po

Translations.

ref.pot

References to the sources.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories.

9.7.2 Operation modifiers
-m’ ‘ --multi-domain

Apply ref.pot to each of the domains in def.po.

-N’ ‘ --no-fuzzy-matching

Do not use fuzzy matching when an exact match is not found. This may speedup the operation considerably.

--use-fuzzy

Consider fuzzy messages in the def.po file like translated messages.Note that using this option is usually wrong, because fuzzy messages areexactly those which have not been validated by a human translator.

--use-untranslated

Consider untranslated messages in the def.po file like translatedmessages. Note that using this option is usually wrong.

9.7.3 Input file syntax
-P’ ‘ --properties-input

Assume the input files are Java ResourceBundles in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input files are NeXTstep/GNUstep localized resource files in.strings syntax, not in PO file syntax.

9.7.4 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


9.8 Invoking the msgattrib Program

msgattrib [option] [inputfile]

The msgattrib program filters the messages of a translation catalogaccording to their attributes, and manipulates the attributes.

9.8.1 Input file location
inputfile

Input PO file.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If no inputfile is given or if it is ‘-’, standard input is read.

9.8.2 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

9.8.3 Message selection
--translated

Keep translated messages, remove untranslated messages.

--untranslated

Keep untranslated messages, remove translated messages.

--no-fuzzy

Remove‘fuzzy’marked messages.

--only-fuzzy

Keep‘fuzzy’marked messages, remove all other messages.

--no-obsolete

Remove obsolete #~ messages.

--only-obsolete

Keep obsolete #~ messages, remove all other messages.

9.8.4 Attribute manipulation

Attributes are modified after the message selection/removal has beenperformed. If the ‘--only-file’ or ‘--ignore-file’ option isspecified, the attribute modification is applied only to those messagesthat are listed in the only-file and not listed in theignore-file.

--set-fuzzy

Set all messages‘fuzzy’.

--clear-fuzzy

Set all messagesnon-‘fuzzy’.

--set-obsolete

Set all messages obsolete.

--clear-obsolete

Set all messages non-obsolete.

--previous

When setting‘fuzzy’mark, keep “previous msgid” of translated messages.

--clear-previous

Remove the “previous msgid” (‘#|’) comments from all messages.

--empty

When removing‘fuzzy’mark, also set msgstr empty.

--only-file=file

Limit the attribute changes to entries that are listed in file.file should be a PO or POT file.

--ignore-file=file

Limit the attribute changes to entries that are not listed in file.file should be a PO or POT file.

--fuzzy

Synonym for ‘--only-fuzzy --clear-fuzzy’: It keeps only the fuzzymessages and removes their‘fuzzy’mark.

--obsolete

Synonym for ‘--only-obsolete --clear-obsolete’: It keeps only theobsolete messages and makes them non-obsolete.

9.8.5 Input file syntax
-P’ ‘ --properties-input

Assume the input file is a Java ResourceBundle in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input file is a NeXTstep/GNUstep localized resource file in.strings syntax, not in PO file syntax.

9.8.6 Output details
--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

-i’ ‘ --indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

-F’ ‘ --sort-by-file

Sort output by file location.

9.8.7 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


9.9 Invoking the msgen Program

msgen [option] inputfile

The msgen program creates an English translation catalog. Theinput file is the last created English PO file, or a PO Template file(generally created by xgettext). Untranslated entries are assigned atranslation that is identical to the msgid.

Note: ‘msginit --no-translator --locale=en’ performs a very similartask. The main difference is that msginit cares specially aboutthe header entry, whereas msgen doesn’t.

9.9.1 Input file location
inputfile

Input PO or POT file.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If inputfile is ‘-’, standard input is read.

9.9.2 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

9.9.3 Input file syntax
-P’ ‘ --properties-input

Assume the input file is a Java ResourceBundle in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input file is a NeXTstep/GNUstep localized resource file in.strings syntax, not in PO file syntax.

9.9.4 Output details
--lang=catalogname

Specify the ‘Language’ field to be used in the header entry. SeeHeader Entry for the meaning of this field. Note: The‘Language-Team’ and ‘Plural-Forms’ fields are not set by thisoption.

--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

-i’ ‘ --indent

Write the .po file using indented style.

--no-location

Do not write ‘#: filename:line’ lines.

-n’ ‘ --add-location=type

Generate ‘#: filename:line’ lines (default).

The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location).

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

-F’ ‘ --sort-by-file

Sort output by file location.

9.9.5 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


9.10 Invoking the msgexec Program

msgexec [option] command [command-option]

The msgexec program applies a command to all translations of atranslation catalog.The command can be any program that reads a translation from standardinput. It is invoked once for each translation. Its output becomesmsgexec’s output. msgexec’s return code is the maximum return codeacross all invocations.

A special builtin command called ‘0’ outputs the translation, followedby a null byte. The output of ‘msgexec 0’ is suitable as input for‘xargs -0’.

--newline

Add newline at the end of each input line.

During each command invocation, the environment variableMSGEXEC_MSGID is bound to the message’s msgid, and the environmentvariable MSGEXEC_LOCATION is bound to the location in the PO fileof the message. If the message has a context, the environment variableMSGEXEC_MSGCTXT is bound to the message’s msgctxt, otherwise it isunbound. If the message has a plural form, environment variableMSGEXEC_MSGID_PLURAL is bound to the message’s msgid_plural andMSGEXEC_PLURAL_FORM is bound to the order number of the pluralactually processed (starting with 0), otherwise both are unbound.If the message has a previous msgid (added by msgmerge),environment variable MSGEXEC_PREV_MSGCTXT is bound to themessage’s previous msgctxt, MSGEXEC_PREV_MSGID is bound tothe previous msgid, and MSGEXEC_PREV_MSGID_PLURAL is bound tothe previous msgid_plural.

Note: It is your responsibility to ensure that the command can copewith input encoded in the translation catalog’s encoding. If thecommand wants input in a particular encoding, you can in a first stepconvert the translation catalog to that encoding using the ‘msgconv’program, before invoking ‘msgexec’. If the command wants inputin the locale’s encoding, but you want to avoid the locale’s encoding, thenyou can first convert the translation catalog to UTF-8 using the‘msgconv’ program and then make ‘msgexec’ work in an UTF-8locale, by using the LC_ALL environment variable.

9.10.1 Input file location
-i inputfile’ ‘ --input=inputfile

Input PO file.

-D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.

If no inputfile is given or if it is ‘-’, standard input is read.

9.10.2 Input file syntax
-P’ ‘ --properties-input

Assume the input file is a Java ResourceBundle in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input file is a NeXTstep/GNUstep localized resource file in.strings syntax, not in PO file syntax.

9.10.3 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.


9.11 Highlighting parts of PO files

Translators are usually only interested in seeing the untranslated andfuzzy messages of a PO file. Also, when a message is set fuzzy becausethe msgid changed, they want to see the differences between the previousmsgid and the current one (especially if the msgid is long and only fewwords in it have changed). Finally, it’s always welcome to highlight thedifferent sections of a message in a PO file (comments, msgid, msgstr, etc.).

Such highlighting is possible through the msgcat options‘--color’ and ‘--style’.


9.11.1 The --color option

The ‘--color=when’ option specifies under which conditionscolorized output should be generated. The when part can be one ofthe following:

always yes

The output will be colorized.

never no

The output will not be colorized.

auto tty

The output will be colorized if the output device is a tty, i.e. when theoutput goes directly to a text screen or terminal emulator window.

html

The output will be colorized and be in HTML format.

--color’ is equivalent to ‘--color=yes’. The default is‘--color=auto’.

Thus, a command like ‘msgcat vi.po’ will produce colorized outputwhen called by itself in a command window. Whereas in a pipe, such as‘msgcat vi.po | less -R’, it will not produce colorized output. Toget colorized output in this situation nevertheless, use the command‘msgcat --color vi.po | less -R’.

The ‘--color=html’ option will produce output that can be viewed ina browser. This can be useful, for example, for Indic languages,because the renderic of Indic scripts in browser is usually better thanin terminal emulators.

Note that the output produced with the --color option is nota valid PO file in itself. It contains additional terminal-specific escapesequences or HTML tags. A PO file reader will give a syntax error whenconfronted with such content. Except for the ‘--color=html’ case,you therefore normally don’t need to save output produced with the--color option in a file.


9.11.2 The environment variable TERM

The environment variable TERM contains a identifier for the textwindow’s capabilities. You can get a detailed list of these cababilitiesby using the ‘infocmp’ command, using ‘man 5 terminfo’ as areference.

When producing text with embedded color directives, msgcat looksat the TERM variable. Text windows today typically support at least8 colors. Often, however, the text window supports 16 or more colors,even though the TERM variable is set to a identifier denoting only8 supported colors. It can be worth setting the TERM variable toa different value in these cases:

xterm

xterm is in most cases built with support for 16 colors. It can alsobe built with support for 88 or 256 colors (but not both). You can try toset TERM to either xterm-16color, xterm-88color, orxterm-256color.

rxvt

rxvt is often built with support for 16 colors. You can try to setTERM to rxvt-16color.

konsole

konsole too is often built with support for 16 colors. You can try toset TERM to konsole-16color or xterm-16color.

After setting TERM, you can verify it by invoking‘msgcat --color=test’ and seeing whether the output looks like areasonable color map.


9.11.3 The --style option

The ‘--style=style_file’ option specifies the style file to usewhen colorizing. It has an effect only when the --color option iseffective.

If the --style option is not specified, the environment variablePO_STYLE is considered. It is meant to point to the user’spreferred style for PO files.

The default style file is $prefix/share/gettext/styles/po-default.css,where $prefix is the installation location.

A few style files are predefined:

po-vim.css

This style imitates the look used by vim 7.

po-emacs-x.css

This style imitates the look used by GNU Emacs 21 and 22 in an X11 window.

po-emacs-xterm.css po-emacs-xterm16.css po-emacs-xterm256.css

This style imitates the look used by GNU Emacs 22 in a terminal of type‘xterm’ (8 colors) or ‘xterm-16color’ (16 colors) or‘xterm-256color’ (256 colors), respectively.

You can use these styles without specifying a directory. They are actuallylocated in $prefix/share/gettext/styles/, where $prefix is theinstallation location.

You can also design your own styles. This is described in the next section.


9.11.4 Style rules for PO files

The same style file can be used for styling of a PO file, for terminaloutput and for HTML output. It is written in CSS (Cascading Style Sheet)syntax. See http://www.w3.org/TR/css2/cover.html for a formaldefinition of CSS. Many HTML authoring tutorials also contain explanationsof CSS.

In the case of HTML output, the style file is embedded in the HTML output.In the case of text output, the style file is interpreted by themsgcat program. This means, in particular, that when@import is used with relative file names, the file names are

  • - relative to the resulting HTML file, in the case of HTML output,
  • - relative to the style sheet containing the @import, in the case oftext output. (Actually, @imports are not yet supported in this case,due to a limitation in libcroco.)

CSS rules are built up from selectors and declarations. The declarationsspecify graphical properties; the selectors specify specify when they apply.

In PO files, the following simple selectors (based on "CSS classes", seethe CSS2 spec, section 5.8.3) are supported.

  • Selectors that apply to entire messages:
    .header

    This matches the header entry of a PO file.

    .translated

    This matches a translated message.

    .untranslated

    This matches an untranslated message (i.e. a message with empty translation).

    .fuzzy

    This matches a fuzzy message (i.e. a message which has a translation thatneeds review by the translator).

    .obsolete

    This matches an obsolete message (i.e. a message that was translated but isnot needed by the current POT file any more).

  • Selectors that apply to parts of a message in PO syntax. Recall the generalstructure of a message in PO syntax:
    white-space
    #  translator-comments
    #. extracted-comments
    #: reference…
    #, flag…
    #| msgid previous-untranslated-string
    msgid untranslated-string
    msgstr translated-string
    
    .comment

    This matches all comments (translator comments, extracted comments,source file reference comments, flag comments, previous message comments,as well as the entire obsolete messages).

    .translator-comment

    This matches the translator comments.

    .extracted-comment

    This matches the extracted comments, i.e. the comments placed by theprogrammer at the attention of the translator.

    .reference-comment

    This matches the source file reference comments (entire lines).

    .reference

    This matches the individual source file references inside the source filereference comment lines.

    .flag-comment

    This matches the flag comment lines (entire lines).

    .flag

    This matches the individual flags inside flag comment lines.

    .fuzzy-flag

    This matches the ‘fuzzy’ flag inside flag comment lines.

    .previous-comment

    This matches the comments containing the previous untranslated string (entirelines).

    .previous

    This matches the previous untranslated string including the string delimiters,the associated keywords (msgid etc.) and the spaces between them.

    .msgid

    This matches the untranslated string including the string delimiters,the associated keywords (msgid etc.) and the spaces between them.

    .msgstr

    This matches the translated string including the string delimiters,the associated keywords (msgstr etc.) and the spaces between them.

    .keyword

    This matches the keywords (msgid, msgstr, etc.).

    .string

    This matches strings, including the string delimiters (double quotes).

  • Selectors that apply to parts of strings:
    .text

    This matches the entire contents of a string (excluding the string delimiters,i.e. the double quotes).

    .escape-sequence

    This matches an escape sequence (starting with a backslash).

    .format-directive

    This matches a format string directive (starting with a ‘%’ sign in thecase of most programming languages, with a ‘{ ’ in the case ofjava-format and csharp-format, with a ‘~’ in the case oflisp-format and scheme-format, or with ‘$’ in the case ofsh-format).

    .invalid-format-directive

    This matches an invalid format string directive.

    .added

    In an untranslated string, this matches a part of the string that was notpresent in the previous untranslated string. (Not yet implemented in thisrelease.)

    .changed

    In an untranslated string or in a previous untranslated string, this matchesa part of the string that is changed or replaced. (Not yet implemented inthis release.)

    .removed

    In a previous untranslated string, this matches a part of the string thatis not present in the current untranslated string. (Not yet implemented inthis release.)

These selectors can be combined to hierarchical selectors. For example,

.msgstr .invalid-format-directive { color: red; }

will highlight the invalid format directives in the translated strings.

In text mode, pseudo-classes (CSS2 spec, section 5.11) and pseudo-elements(CSS2 spec, section 5.12) are not supported.

The declarations in HTML mode are not limited; any graphical attributesupported by the browsers can be used.

The declarations in text mode are limited to the following properties. Otherproperties will be silently ignored.

color (CSS2 spec, section 14.1) background-color (CSS2 spec, section 14.2.1)

These properties is supported. Colors will be adjusted to match the terminal’scapabilities. Note that many terminals support only 8 colors.

font-weight (CSS2 spec, section 15.2.3)

This property is supported, but most terminals can only render two differentweights: normal and bold. Values >= 600 are rendered asbold.

font-style (CSS2 spec, section 15.2.3)

This property is supported. The values italic and oblique arerendered the same way.

text-decoration (CSS2 spec, section 16.3.1)

This property is supported, limited to the values none andunderline.


Previous: Style rules, Up: Colorizing   [Contents][Index]

9.11.5 Customizing less for viewing PO files

The ‘less’ program is a popular text file browser for use in a textscreen or terminal emulator. It also supports text with embedded escapesequences for colors and text decorations.

You can use less to view a PO file like this (assuming an UTF-8environment):

msgcat --to-code=UTF-8 --color xyz.po | less -R

You can simplify this to this simple command:

less xyz.po

after these three preparations:

  1. Add the options ‘-R’ and ‘-f’ to the LESS environmentvariable. In sh shells:
    $ LESS="$LESS -R -f"
    $ export LESS
    
  2. If your system does not already have the lessopen.sh andlessclose.sh scripts, create them and set the LESSOPEN andLESSCLOSE environment variables, as indicated in the manual page(‘man less’).
  3. Add to lessopen.sh a piece of script that recognizes PO filesthrough their file extension and invokes msgcat on them, producinga temporary file. Like this:
    case "$1" in
      *.po)
        tmpfile=`mktemp "${TMPDIR-/tmp}/less.XXXXXX"`
        msgcat --to-code=UTF-8 --color "$1" > "$tmpfile"
        echo "$tmpfile"
        exit 0
        ;;
    esac
    

9.12 Writing your own programs that process PO files

For the tasks for which a combination of ‘msgattrib’, ‘msgcat’ etc.is not sufficient, a set of C functions is provided in a library, to make itpossible to process PO files in your own programs. When you use this library,you don’t need to write routines to parse the PO file; instead, you retrievea pointer in memory to each of messages contained in the PO file. Functionsfor writing PO files are not provided at this time.

The functions are declared in the header file ‘<gettext-po.h>’, and aredefined in a library called ‘libgettextpo’.

Data Type: po_file_t

This is a pointer type that refers to the contents of a PO file, after it hasbeen read into memory.

Data Type: po_message_iterator_t

This is a pointer type that refers to an iterator that produces a sequence ofmessages.

Data Type: po_message_t

This is a pointer type that refers to a message of a PO file, including itstranslation.

Function: po_file_t po_file_read (const char *filename)

The po_file_read function reads a PO file into memory. The file nameis given as argument. The return value is a handle to the PO file’s contents,valid until po_file_free is called on it. In case of error, the returnvalue is NULL, and errno is set.

Function: void po_file_free (po_file_t file)

The po_file_free function frees a PO file’s contents from memory,including all messages that are only implicitly accessible through iterators.

Function: const char * const * po_file_domains (po_file_t file)

The po_file_domains function returns the domains for which the givenPO file has messages. The return value is a NULL terminated arraywhich is valid as long as the file handle is valid. For PO files whichcontain no ‘domain’ directive, the return value contains only one domain,namely the default domain "messages".

Function: po_message_iterator_t po_message_iterator (po_file_t file, const char *domain)

The po_message_iterator returns an iterator that will produce themessages of file that belong to the given domain. If domainis NULL, the default domain is used instead. To list the messages,use the function po_next_message repeatedly.

Function: void po_message_iterator_free (po_message_iterator_t iterator)

The po_message_iterator_free function frees an iterator previouslyallocated through the po_message_iterator function.

Function: po_message_t po_next_message (po_message_iterator_t iterator)

The po_next_message function returns the next message fromiterator and advances the iterator. It returns NULL when theiterator has reached the end of its message list.

The following functions returns details of a po_message_t. Recallthat the results are valid as long as the file handle is valid.

Function: const char * po_message_msgid (po_message_t message)

The po_message_msgid function returns the msgid (untranslatedEnglish string) of a message. This is guaranteed to be non-NULL.

Function: const char * po_message_msgid_plural (po_message_t message)

The po_message_msgid_plural function returns the msgid_plural(untranslated English plural string) of a message with plurals, or NULLfor a message without plural.

Function: const char * po_message_msgstr (po_message_t message)

The po_message_msgstr function returns the msgstr (translation)of a message. For an untranslated message, the return value is an emptystring.

Function: const char * po_message_msgstr_plural (po_message_t message, int index)

The po_message_msgstr_plural function returns themsgstr[index] of a message with plurals, or NULL whenthe index is out of range or for a message without plural.

Here is an example code how these functions can be used.

const char *filename = …;
po_file_t file = po_file_read (filename);

if (file == NULL)
  error (EXIT_FAILURE, errno, "couldn't open the PO file %s", filename);
{
  const char * const *domains = po_file_domains (file);
  const char * const *domainp;

  for (domainp = domains; *domainp; domainp++)
    {
      const char *domain = *domainp;
      po_message_iterator_t iterator = po_message_iterator (file, domain);

      for (;;)
        {
          po_message_t *message = po_next_message (iterator);

          if (message == NULL)
            break;
          {
            const char *msgid = po_message_msgid (message);
            const char *msgstr = po_message_msgstr (message);

            …
          }
        }
      po_message_iterator_free (iterator);
    }
}
po_file_free (file);

Next: Programmers, Previous: Manipulating, Up: Top   [Contents][Index]

10 Producing Binary MO Files


10.1 Invoking the msgfmt Program

msgfmt [option] filename.po …

The msgfmt programs generates a binary message catalog from a textualtranslation description.

10.1.1 Input file location
filename.po …’ ‘ -D directory’ ‘ --directory=directory

Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting binaryfile will be written relative to the current directory, though.

If an input file is ‘-’, standard input is read.

10.1.2 Operation mode
-j’ ‘ --java

Java mode: generate a Java ResourceBundle class.

--java2

Like –java, and assume Java2 (JDK 1.2 or higher).

--csharp

C# mode: generate a .NET .dll file containing a subclass ofGettextResourceSet.

--csharp-resources

C# resources mode: generate a .NET .resources file.

--tcl

Tcl mode: generate a tcl/msgcat .msg file.

--qt

Qt mode: generate a Qt .qm file.

--desktop

Desktop Entry mode: generate a .desktop file.

--xml

XML mode: generate an XML file.

10.1.3 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

--strict

Direct the program to work strictly following the Uniforum/Sunimplementation. Currently this only affects the naming of the outputfile. If this option is not given the name of the output file is thesame as the domain name. If the strict Uniforum mode is enabled thesuffix .mo is added to the file name if it is not alreadypresent.

We find this behaviour of Sun’s implementation rather silly and so bydefault this mode is not selected.

If the output file is ‘-’, output is written to standard output.

10.1.4 Output file location in Java mode
-r resource’ ‘ --resource=resource

Specify the resource name.

-l locale’ ‘ --locale=locale

Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.

-d directory

Specify the base directory of classes directory hierarchy.

--source

Produce a .java source file, instead of a compiled .class file.

The class name is determined by appending the locale name to the resource name,separated with an underscore. The ‘-d’ option is mandatory. The classis written under the specified directory.

10.1.5 Output file location in C# mode
-r resource’ ‘ --resource=resource

Specify the resource name.

-l locale’ ‘ --locale=locale

Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.

-d directory

Specify the base directory for locale dependent .dll files.

The ‘-l’ and ‘-d’ options are mandatory. The .dll file iswritten in a subdirectory of the specified directory whose name depends on thelocale.

10.1.6 Output file location in Tcl mode
-l locale’ ‘ --locale=locale

Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.

-d directory

Specify the base directory of .msg message catalogs.

The ‘-l’ and ‘-d’ options are mandatory. The .msg file iswritten in the specified directory.

10.1.7 Desktop Entry mode operations
--template=template

Specify a .desktop file used as a template.

-k[keywordspec]’ ‘ --keyword[=keywordspec]

Specify keywordspec as an additional keyword to be looked for.Without a keywordspec, the option means to not use default keywords.

-l locale’ ‘ --locale=locale

Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.

-d directory

Specify the directory where PO files are read. The directory mustcontain the ‘LINGUAS’ file.

To generate a ‘.desktop’ file for a single locale, you can use itas follows.

msgfmt --desktop --template=template --locale=locale \
  -o file filename.po …

msgfmt provides a special "bulk" operation mode to process multiple.po files at a time.

msgfmt --desktop --template=template -d directory -o file

msgfmt first reads the ‘LINGUAS’ file under directory, andthen processes all ‘.po’ files listed there. You can also limitthe locales to a subset, through the ‘LINGUAS’ environmentvariable.

For either operation modes, the ‘-o’ and ‘--template’options are mandatory.

10.1.8 XML mode operations
--template=template

Specify an XML file used as a template.

-L name’ ‘ --language=name

Specifies the language of the input files.

-l locale’ ‘ --locale=locale

Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.

-d directory

Specify the base directory of .po message catalogs.

To generate an XML file for a single locale, you can use it as follows.

msgfmt --xml --template=template --locale=locale \
  -o file filename.po …

msgfmt provides a special "bulk" operation mode to process multiple.po files at a time.

msgfmt --xml --template=template -d directory -o file

msgfmt first reads the ‘LINGUAS’ file under directory, andthen processes all ‘.po’ files listed there. You can also limitthe locales to a subset, through the ‘LINGUAS’ environmentvariable.

For either operation modes, the ‘-o’ and ‘--template’options are mandatory.

10.1.9 Input file syntax
-P’ ‘ --properties-input

Assume the input files are Java ResourceBundles in Java .propertiessyntax, not in PO file syntax.

--stringtable-input

Assume the input files are NeXTstep/GNUstep localized resource files in.strings syntax, not in PO file syntax.

10.1.10 Input file interpretation
-c’ ‘ --check

Perform all the checks implied by --check-format, --check-header,--check-domain.

--check-format

Check language dependent format strings.

If the string represents a format string used in aprintf-like function both strings should have the same number of‘%’ format specifiers, with matching types. If the flagc-format or possible-c-format appears in the specialcomment #, for this entry a check is performed. For example, thecheck will diagnose using ‘%.*s’ against ‘%s’, or ‘%d’against ‘%s’, or ‘%d’ against ‘%x’. It can even handlepositional parameters.

Normally the xgettext program automatically decides whether astring is a format string or not. This algorithm is not perfect,though. It might regard a string as a format string though it is notused in a printf-like function and so msgfmt might reporterrors where there are none.

To solve this problem the programmer can dictate the decision to thexgettext program (see c-format). The translator should notconsider removing the flag from the #, line. This "fix" would bereversed again as soon as msgmerge is called the next time.

--check-header

Verify presence and contents of the header entry. See Header Entry,for a description of the various fields in the header entry.

--check-domain

Check for conflicts between domain directives and the --output-fileoption

-C’ ‘ --check-compatibility

Check that GNU msgfmt behaves like X/Open msgfmt. This will give an errorwhen attempting to use the GNU extensions.

--check-accelerators[=char]

Check presence of keyboard accelerators for menu items. This is based onthe convention used in some GUIs that a keyboard accelerator in a menuitem string is designated by an immediately preceding ‘&’ character.Sometimes a keyboard accelerator is also called "keyboard mnemonic".This check verifies that if the untranslated string has exactly one‘&’ character, the translated string has exactly one ‘&’ as well.If this option is given with a char argument, this char shouldbe a non-alphanumeric character and is used as keyboard accelerator markinstead of ‘&’.

-f’ ‘ --use-fuzzy

Use fuzzy entries in output. Note that using this option is usually wrong,because fuzzy messages are exactly those which have not been validated bya human translator.

10.1.11 Output details
-a number’ ‘ --alignment=number

Align strings to number bytes (default: 1).

--endianness=byteorder

Write out 32-bit numbers in the given byte order. The possible values arebig and little. The default is little.

MO files of any endianness can be used on any platform. When a MO file hasan endianness other than the platform’s one, the 32-bit numbers from the MOfile are swapped at runtime. The performance impact is negligible.

This option can be useful to produce MO files that are optimized for oneplatform.

--no-hash

Don’t include a hash table in the binary file. Lookup will be more expensiveat run time (binary search instead of hash table lookup).

10.1.12 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.

--statistics

Print statistics about translations. When the option --verbose is usedin combination with --statistics, the input file name is printed infront of the statistics line.

-v’ ‘ --verbose

Increase verbosity level.


10.2 Invoking the msgunfmt Program

msgunfmt [option] [file]...

The msgunfmt program converts a binary message catalog to aUniforum style .po file.

10.2.1 Operation mode
-j’ ‘ --java

Java mode: input is a Java ResourceBundle class.

--csharp

C# mode: input is a .NET .dll file containing a subclass ofGettextResourceSet.

--csharp-resources

C# resources mode: input is a .NET .resources file.

--tcl

Tcl mode: input is a tcl/msgcat .msg file.

10.2.2 Input file location
file

Input .mo files.

If no input file is given or if it is ‘-’, standard input is read.

10.2.3 Input file location in Java mode
-r resource’ ‘ --resource=resource

Specify the resource name.

-l locale’ ‘ --locale=locale

Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.

The class name is determined by appending the locale name to the resource name,separated with an underscore. The class is located using the CLASSPATH.

10.2.4 Input file location in C# mode
-r resource’ ‘ --resource=resource

Specify the resource name.

-l locale’ ‘ --locale=locale

Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.

-d directory

Specify the base directory for locale dependent .dll files.

The ‘-l’ and ‘-d’ options are mandatory. The .msg file islocated in a subdirectory of the specified directory whose name depends on thelocale.

10.2.5 Input file location in Tcl mode
-l locale’ ‘ --locale=locale

Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.

-d directory

Specify the base directory of .msg message catalogs.

The ‘-l’ and ‘-d’ options are mandatory. The .msg file islocated in the specified directory.

10.2.6 Output file location
-o file’ ‘ --output-file=file

Write output to specified file.

The results are written to standard output if no output file is specifiedor if it is ‘-’.

10.2.7 Output details
--color’ ‘ --color=when

Specify whether or when to use colors and other text attributes.See The --color option for details.

--style=style_file

Specify the CSS style rule file to use for --color.See The --style option for details.

--force-po

Always write an output file even if it contains no message.

-i’ ‘ --indent

Write the .po file using indented style.

--strict

Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.

-p’ ‘ --properties-output

Write out a Java ResourceBundle in Java .properties syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.

--stringtable-output

Write out a NeXTstep/GNUstep localized resource file in .strings syntax.Note that this file format doesn’t support plural forms.

-w number’ ‘ --width=number

Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.

--no-wrap

Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.

-s’ ‘ --sort-output

Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.

10.2.8 Informative output
-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.

-v’ ‘ --verbose

Increase verbosity level.


10.3 The Format of GNU MO Files

The format of the generated MO files is best described by a picture,which appears below.

The first two words serve the identification of the file. The magicnumber will always signal GNU MO files. The number is stored in thebyte order used when the MO file was generated, so the magic numberreally is two numbers: 0x950412de and 0xde120495.

The second word describes the current revision of the file format,composed of a major and a minor revision number. The revision numbersensure that the readers of MO files can distinguish new formats fromold ones and handle their contents, as far as possible. For now themajor revision is 0 or 1, and the minor revision is also 0 or 1. Morerevisions might be added in the future. A program seeing an unexpectedmajor revision number should stop reading the MO file entirely; whereasan unexpected minor revision number means that the file can be read butwill not reveal its full contents, when parsed by a program thatsupports only smaller minor revision numbers.

The version is keptseparate from the magic number, instead of using different magicnumbers for different formats, mainly because /etc/magic isnot updated often.

Follow a number of pointers to later tables in the file, allowingfor the extension of the prefix part of MO files without having torecompile programs reading them. This might become useful for laterinserting a few flag bits, indication about the charset used, newtables, or other things.

Then, at offset O and offset T in the picture, two tablesof string descriptors can be found. In both tables, each stringdescriptor uses two 32 bits integers, one for the string length,another for the offset of the string in the MO file, counting in bytesfrom the start of the file. The first table contains descriptorsfor the original strings, and is sorted so the original stringsare in increasing lexicographical order. The second table containsdescriptors for the translated strings, and is parallel to the firsttable: to find the corresponding translation one has to access thearray slot in the second array with the same index.

Having the original strings sorted enables the use of simple binarysearch, for when the MO file does not contain an hashing table, orfor when it is not practical to use the hashing table provided inthe MO file. This also has another advantage, as the empty stringin a PO file GNU gettext is usually translated intosome system information attached to that particular MO file, and theempty string necessarily becomes the first in both the original andtranslated tables, making the system information very easy to find.

The size S of the hash table can be zero. In this case, thehash table itself is not contained in the MO file. Some people mightprefer this because a precomputed hashing table takes disk space, anddoes not win that much speed. The hash table contains indicesto the sorted array of strings in the MO file. Conflict resolution isdone by double hashing. The precise hashing algorithm used is fairlydependent on GNU gettext code, and is not documented here.

As for the strings themselves, they follow the hash file, and eachis terminated with a NUL, and this NUL is not counted inthe length which appears in the string descriptor. The msgfmtprogram has an option selecting the alignment for MO file strings.With this option, each string is separately aligned so it starts atan offset which is a multiple of the alignment value. On some RISCmachines, a correct alignment will speed things up.

Contexts are stored by storing the concatenation of the context, aEOT byte, and the original string, instead of the original string.

Plural forms are stored by letting the plural of the original stringfollow the singular of the original string, separated through aNUL byte. The length which appears in the string descriptorincludes both. However, only the singular of the original stringtakes part in the hash table lookup. The plural variants of thetranslation are all stored consecutively, separated through aNUL byte. Here also, the length in the string descriptorincludes all of them.

Nothing prevents a MO file from having embedded NULs in strings.However, the program interface currently used already presumesthat strings are NUL terminated, so embedded NULs aresomewhat useless. But the MO file format is general enough so otherinterfaces would be later possible, if for example, we ever want toimplement wide characters right in MO files, where NUL bytes mayaccidentally appear. (No, we don’t want to have wide characters in MOfiles. They would make the file unnecessarily large, and the‘wchar_t’ type being platform dependent, MO files would beplatform dependent as well.)

This particular issue has been strongly debated in the GNUgettext development forum, and it is expectable that MO fileformat will evolve or change over time. It is even possible that manyformats may later be supported concurrently. But surely, we have tostart somewhere, and the MO file format described here is a good start.Nothing is cast in concrete, and the format may later evolve fairlyeasily, so we should feel comfortable with the current approach.

        byte
             +------------------------------------------+
          0  | magic number = 0x950412de                |
             |                                          |
          4  | file format revision = 0                 |
             |                                          |
          8  | number of strings                        |  == N
             |                                          |
         12  | offset of table with original strings    |  == O
             |                                          |
         16  | offset of table with translation strings |  == T
             |                                          |
         20  | size of hashing table                    |  == S
             |                                          |
         24  | offset of hashing table                  |  == H
             |                                          |
             .                                          .
             .    (possibly more entries later)         .
             .                                          .
             |                                          |
          O  | length & offset 0th string  ----------------.
      O + 8  | length & offset 1st string  ------------------.
              ...                                    ...   | |
O + ((N-1)*8)| length & offset (N-1)th string           |  | |
             |                                          |  | |
          T  | length & offset 0th translation  ---------------.
      T + 8  | length & offset 1st translation  -----------------.
              ...                                    ...   | | | |
T + ((N-1)*8)| length & offset (N-1)th translation      |  | | | |
             |                                          |  | | | |
          H  | start hash table                         |  | | | |
              ...                                    ...   | | | |
  H + S * 4  | end hash table                           |  | | | |
             |                                          |  | | | |
             | NUL terminated 0th string  <----------------' | | |
             |                                          |    | | |
             | NUL terminated 1st string  <------------------' | |
             |                                          |      | |
              ...                                    ...       | |
             |                                          |      | |
             | NUL terminated 0th translation  <---------------' |
             |                                          |        |
             | NUL terminated 1st translation  <-----------------'
             |                                          |
              ...                                    ...
             |                                          |
             +------------------------------------------+

Next: Translators, Previous: Binaries, Up: Top   [Contents][Index]

11 The Programmer’s View

One aim of the current message catalog implementation provided byGNU gettext was to use the system’s message catalog handling, if theinstaller wishes to do so. So we perhaps should first take a look atthe solutions we know about. The people in the POSIX committee did notmanage to agree on one of the semi-official standards which we’lldescribe below. In fact they couldn’t agree on anything, so they decidedonly to include an example of an interface. The major Unix vendorsare split in the usage of the two most important specifications: X/Open’scatgets vs. Uniforum’s gettext interface. We’ll describe them both andlater explain our solution of this dilemma.


Next: gettext, Previous: Programmers, Up: Programmers   [Contents][Index]

11.1 About catgets

The catgets implementation is defined in the X/Open PortabilityGuide, Volume 3, XSI Supplementary Definitions, Chapter 5. But theprocess of creating this standard seemed to be too slow for some ofthe Unix vendors so they created their implementations on preliminaryversions of the standard. Of course this leads again to problems whilewriting platform independent programs: even the usage of catgetsdoes not guarantee a unique interface.

Another, personal comment on this that only a bunch of committee memberscould have made this interface. They never really tried to programusing this interface. It is a fast, memory-saving implementation, anuser can happily live with it. But programmers hate it (at least I andsome others do…)

But we must not forget one point: after all the trouble with transferringthe rights on Unix(tm) they at last came to X/Open, the very same whopublished this specification. This leads me to making the predictionthat this interface will be in future Unix standards (e.g. Spec1170) andtherefore part of all Unix implementation (implementations, which areallowed to wear this name).


11.1.1 The Interface

The interface to the catgets implementation consists of threefunctions which correspond to those used in file access: catopento open the catalog for using, catgets for accessing the messagetables, and catclose for closing after work is done. Prototypesfor the functions and the needed definitions are in the<nl_types.h> header file.

catopen is used like in this:

nl_catd catd = catopen ("catalog_name", 0);

The function takes as the argument the name of the catalog. This usualrefers to the name of the program or the package. The second parameteris not further specified in the standard. I don’t even know whether itis implemented consistently among various systems. So the common adviceis to use 0 as the value. The return value is a handle to themessage catalog, equivalent to handles to file returned by open.

This handle is of course used in the catgets function which canbe used like this:

char *translation = catgets (catd, set_no, msg_id, "original string");

The first parameter is this catalog descriptor. The second parameterspecifies the set of messages in this catalog, in which the messagedescribed by msg_id is obtained. catgets therefore uses athree-stage addressing:

catalog name ⇒ set number ⇒ message ID ⇒ translation

The fourth argument is not used to address the translation. It is givenas a default value in case when one of the addressing stages fail. Oneimportant thing to remember is that although the return type of catgetsis char * the resulting string must not be changed. Itshould better be const char *, but the standard is published in1988, one year before ANSI C.

The last of these functions is used and behaves as expected:

catclose (catd);

After this no catgets call using the descriptor is legal anymore.


11.1.2 Problems with the catgets Interface?!

Now that this description seemed to be really easy — where are theproblems we speak of? In fact the interface could be used in areasonable way, but constructing the message catalogs is a pain. Thereason for this lies in the third argument of catgets: the uniquemessage ID. This has to be a numeric value for all messages in a singleset. Perhaps you could imagine the problems keeping such a list whilechanging the source code. Add a new message here, remove one there. Ofcourse there have been developed a lot of tools helping to organize thischaos but one as the other fails in one aspect or the other. We don’twant to say that the other approach has no problems but they are farmore easy to manage.


Next: Comparison, Previous: catgets, Up: Programmers   [Contents][Index]

11.2 About gettext

The definition of the gettext interface comes from a Uniforumproposal. It was submitted there by Sun, who had implemented thegettext function in SunOS 4, around 1990. Nowadays, thegettext interface is specified by the OpenI18N standard.

The main point about this solution is that it does not follow themethod of normal file handling (open-use-close) and that it does notburden the programmer with so many tasks, especially the unique key handling.Of course here also a unique key is needed, but this key is the messageitself (how long or short it is). See Comparison for a moredetailed comparison of the two methods.

The following section contains a rather detailed description of theinterface. We make it that detailed because this is the interfacewe chose for the GNU gettext Library. Programmers interestedin using this library will be interested in this description.


Next: Ambiguities, Previous: gettext, Up: gettext   [Contents][Index]

11.2.1 The Interface

The minimal functionality an interface must have is a) to select adomain the strings are coming from (a single domain for all programs isnot reasonable because its construction and maintenance is difficult,perhaps impossible) and b) to access a string in a selected domain.

This is principally the description of the gettext interface. Ithas a global domain which unqualified usages reference. Of course thisdomain is selectable by the user.

char *textdomain (const char *domain_name);

This provides the possibility to change or query the current status ofthe current global domain of the LC_MESSAGE category. Theargument is a null-terminated string, whose characters must be legal inthe use in filenames. If the domain_name argument is NULL,the function returns the current value. If no value has been setbefore, the name of the default domain is returned: messages.Please note that although the return value of textdomain is oftype char * no changing is allowed. It is also important to knowthat no checks of the availability are made. If the name is notavailable you will see this by the fact that no translations are provided.

To use a domain set by textdomain the function

char *gettext (const char *msgid);

is to be used. This is the simplest reasonable form one can imagine.The translation of the string msgid is returned if it is availablein the current domain. If it is not available, the argument itself isreturned. If the argument is NULL the result is undefined.

One thing which should come into mind is that no explicit dependency tothe used domain is given. The current value of the domain is used.If this changes between twoexecutions of the same gettext call in the program, both callsreference a different message catalog.

For the easiest case, which is normally used in internationalizedpackages, once at the beginning of execution a call to textdomainis issued, setting the domain to a unique name, normally the packagename. In the following code all strings which have to be translated arefiltered through the gettext function. That’s all, the package speaksyour language.


11.2.2 Solving Ambiguities

While this single name domain works well for most applications theremight be the need to get translations from more than one domain. Ofcourse one could switch between different domains with calls totextdomain, but this is really not convenient nor is it fast. Apossible situation could be one case subject to discussion during thiswriting: allerror messages of functions in the set of common used functions shouldgo into a separate domain error. By this mean we would only needto translate them once.Another case are messages from a library, as these have to beindependent of the current domain set by the application.

For this reasons there are two more functions to retrieve strings:

char *dgettext (const char *domain_name, const char *msgid);
char *dcgettext (const char *domain_name, const char *msgid,
                 int category);

Both take an additional argument at the first place, which correspondsto the argument of textdomain. The third argument ofdcgettext allows to use another locale category but LC_MESSAGES.But I really don’t know where this can be useful. If thedomain_name is NULL or category has an value besidethe known ones, the result is undefined. It should also be noted thatthis function is not part of the second known implementation of thisfunction family, the one found in Solaris.

A second ambiguity can arise by the fact, that perhaps more than onedomain has the same name. This can be solved by specifying where theneeded message catalog files can be found.

char *bindtextdomain (const char *domain_name,
                      const char *dir_name);

Calling this function binds the given domain to a file in the specifieddirectory (how this file is determined follows below). Especially afile in the systems default place is not favored against the specifiedfile anymore (as it would be by solely using textdomain). ANULL pointer for the dir_name parameter returns the bindingassociated with domain_name. If domain_name itself isNULL nothing happens and a NULL pointer is returned. Hereagain as for all the other functions is true that none of the returnvalue must be changed!

It is important to remember that relative path names for thedir_name parameter can be trouble. Since the path is alwayscomputed relative to the current directory different results will beachieved when the program executes a chdir command. Relativepaths should always be avoided to avoid dependencies andunreliabilities.


11.2.3 Locating Message Catalog Files

Because many different languages for many different packages have to bestored we need some way to add these information to file message catalogfiles. The way usually used in Unix environments is have this encodingin the file name. This is also done here. The directory name given inbindtextdomains second argument (or the default directory),followed by the name of the locale, the locale category, and the domain nameare concatenated:

dir_name/locale/LC_category/domain_name.mo

The default value for dir_name is system specific. For the GNUlibrary, and for packages adhering to its conventions, it’s:

/usr/local/share/locale

locale is the name of the locale category which is designated byLC_category. For gettext and dgettext thisLC_category is always LC_MESSAGES.3The name of the locale category is determined throughsetlocale (LC_category, NULL).4When using the function dcgettext, you can specify the locale categorythrough the third argument.


11.2.4 How to specify the output character set gettext uses

gettext not only looks up a translation in a message catalog. Italso converts the translation on the fly to the desired output characterset. This is useful if the user is working in a different character setthan the translator who created the message catalog, because it avoidsdistributing variants of message catalogs which differ only in thecharacter set.

The output character set is, by default, the value of nl_langinfo(CODESET), which depends on the LC_CTYPE part of the currentlocale. But programs which store strings in a locale independent way(e.g. UTF-8) can request that gettext and related functionsreturn the translations in that encoding, by use of thebind_textdomain_codeset function.

Note that the msgid argument to gettext is not subject tocharacter set conversion. Also, when gettext does not find atranslation for msgid, it returns msgid unchanged –independently of the current output character set. It is thereforerecommended that all msgids be US-ASCII strings.

Function: char * bind_textdomain_codeset (const char *domainname, const char *codeset)

The bind_textdomain_codeset function can be used to specify theoutput character set for message catalogs for domain domainname.The codeset argument must be a valid codeset name which can be usedfor the iconv_open function, or a null pointer.

If the codeset parameter is the null pointer,bind_textdomain_codeset returns the currently selected codesetfor the domain with the name domainname. It returns NULL ifno codeset has yet been selected.

The bind_textdomain_codeset function can be used several times. If used multiple times with the same domainname argument, thelater call overrides the settings made by the earlier one.

The bind_textdomain_codeset function returns a pointer to astring containing the name of the selected codeset. The string isallocated internally in the function and must not be changed by theuser. If the system went out of core during the execution ofbind_textdomain_codeset, the return value is NULL and theglobal variable errno is set accordingly.


11.2.5 Using contexts for solving ambiguities

One place where the gettext functions, if used normally, have bigproblems is within programs with graphical user interfaces (GUIs). Theproblem is that many of the strings which have to be translated are veryshort. They have to appear in pull-down menus which restricts thelength. But strings which are not containing entire sentences or atleast large fragments of a sentence may appear in more than onesituation in the program but might have different translations. This isespecially true for the one-word strings which are frequently used inGUI programs.

As a consequence many people say that the gettext approach iswrong and instead catgets should be used which indeed does nothave this problem. But there is a very simple and powerful method tohandle this kind of problems with the gettext functions.

Contexts can be added to strings to be translated. A context dependenttranslation lookup is when a translation for a given string is searched,that is limited to a given context. The translation for the same stringin a different context can be different. The different translations ofthe same string in different contexts can be stored in the in the sameMO file, and can be edited by the translator in the same PO file.

The gettext.h include file contains the lookup macros for stringswith contexts. They are implemented as thin macros and inline functionsover the functions from <libintl.h>.

const char *pgettext (const char *msgctxt, const char *msgid);

In a call of this macro, msgctxt and msgid must be stringliterals. The macro returns the translation of msgid, restrictedto the context given by msgctxt.

The msgctxt string is visible in the PO file to the translator.You should try to make it somehow canonical and never changing. Becauseevery time you change an msgctxt, the translator will have to reviewthe translation of msgid.

Finding a canonical msgctxt string that doesn’t change over time canbe hard. But you shouldn’t use the file name or class name containing thepgettext call – because it is a common development task to renamea file or a class, and it shouldn’t cause translator work. Also you shouldn’tuse a comment in the form of a complete English sentence as msgctxt –because orthography or grammar changes are often applied to such sentences,and again, it shouldn’t force the translator to do a review.

The ‘p’ in ‘pgettext’ stands for “particular”: pgettextfetches a particular translation of the msgid.

const char *dpgettext (const char *domain_name,
                       const char *msgctxt, const char *msgid);
const char *dcpgettext (const char *domain_name,
                        const char *msgctxt, const char *msgid,
                        int category);

These are generalizations of pgettext. They behave similarly todgettext and dcgettext, respectively. The domain_nameargument defines the translation domain. The category argumentallows to use another locale category than LC_MESSAGES.

As as example consider the following fictional situation. A GUI programhas a menu bar with the following entries:

+------------+------------+--------------------------------------+
| File       | Printer    |                                      |
+------------+------------+--------------------------------------+
| Open     | | Select   |
| New      | | Open     |
+----------+ | Connect  |
             +----------+

To have the strings File, Printer, Open,New, Select, and Connect translated there has to beat some point in the code a call to a function of the gettextfamily. But in two places the string passed into the function would beOpen. The translations might not be the same and therefore weare in the dilemma described above.

What distinguishes the two places is the menu path from the menu root tothe particular menu entries:

Menu|File
Menu|Printer
Menu|File|Open
Menu|File|New
Menu|Printer|Select
Menu|Printer|Open
Menu|Printer|Connect

The context is thus the menu path without its last part. So, the callslook like this:

pgettext ("Menu|", "File")
pgettext ("Menu|", "Printer")
pgettext ("Menu|File|", "Open")
pgettext ("Menu|File|", "New")
pgettext ("Menu|Printer|", "Select")
pgettext ("Menu|Printer|", "Open")
pgettext ("Menu|Printer|", "Connect")

Whether or not to use the ‘|’ character at the end of the context is amatter of style.

For more complex cases, where the msgctxt or msgid are notstring literals, more general macros are available:

const char *pgettext_expr (const char *msgctxt, const char *msgid);
const char *dpgettext_expr (const char *domain_name,
                            const char *msgctxt, const char *msgid);
const char *dcpgettext_expr (const char *domain_name,
                             const char *msgctxt, const char *msgid,
                             int category);

Here msgctxt and msgid can be arbitrary string-valued expressions.These macros are more general. But in the case that both argument expressionsare string literals, the macros without the ‘_expr’ suffix are moreefficient.


11.2.6 Additional functions for plural forms

The functions of the gettext family described so far (and all thecatgets functions as well) have one problem in the real worldwhich have been neglected completely in all existing approaches. Whatis meant here is the handling of plural forms.

Looking through Unix source code before the time anybody thought aboutinternationalization (and, sadly, even afterwards) one can often findcode similar to the following:

   printf ("%d file%s deleted", n, n == 1 ? "" : "s");

After the first complaints from people internationalizing the code peopleeither completely avoided formulations like this or used strings like"file(s)". Both look unnatural and should be avoided. Firsttries to solve the problem correctly looked like this:

   if (n == 1)
     printf ("%d file deleted", n);
   else
     printf ("%d files deleted", n);

But this does not solve the problem. It helps languages where theplural form of a noun is not simply constructed by adding an‘s’but that is all. Once again people fell into the trap of believing therules their language is using are universal. But the handling of pluralforms differs widely between the language families. For example,Rafal Maszkowski <[email protected]> reports:

In Polish we use e.g. plik (file) this way:

1 plik
2,3,4 pliki
5-21 pliko'w
22-24 pliki
25-31 pliko'w

and so on (o’ means 8859-2 oacute which should be rather okreska,similar to aogonek).

There are two things which can differ between languages (and even insidelanguage families);

  • The form how plural forms are built differs. This is a problem withlanguages which have many irregularities. German, for instance, is adrastic case. Though English and German are part of the same languagefamily (Germanic), the almost regular forming of plural noun forms(appending an‘s’)is hardly found in German.
  • The number of plural forms differ. This is somewhat surprising forthose who only have experiences with Romanic and Germanic languagessince here the number is the same (there are two).

    But other language families have only one form or many forms. Moreinformation on this in an extra section.

The consequence of this is that application writers should not try tosolve the problem in their code. This would be localization since it isonly usable for certain, hardcoded language environments. Instead theextended gettext interface should be used.

These extra functions are taking instead of the one key string twostrings and a numerical argument. The idea behind this is that usingthe numerical argument and the first string as a key, the implementationcan select using rules specified by the translator the right pluralform. The two string arguments then will be used to provide a returnvalue in case no message catalog is found (similar to the normalgettext behavior). In this case the rules for Germanic languageis used and it is assumed that the first string argument is the singularform, the second the plural form.

This has the consequence that programs without language catalogs candisplay the correct strings only if the program itself is written usinga Germanic language. This is a limitation but since the GNU C library(as well as the GNU gettext package) are written as part of theGNU package and the coding standards for the GNU project require programbeing written in English, this solution nevertheless fulfills itspurpose.

Function: char * ngettext (const char *msgid1, const char *msgid2, unsigned long int n)

The ngettext function is similar to the gettext functionas it finds the message catalogs in the same way. But it takes twoextra arguments. The msgid1 parameter must contain the singularform of the string to be converted. It is also used as the key for thesearch in the catalog. The msgid2 parameter is the plural form.The parameter n is used to determine the plural form. If nomessage catalog is found msgid1 is returned if n == 1,otherwise msgid2.

An example for the use of this function is:

printf (ngettext ("%d file removed", "%d files removed", n), n);

Please note that the numeric value n has to be passed to theprintf function as well. It is not sufficient to pass it only tongettext.

In the English singular case, the number – always 1 – can be replaced with"one":

printf (ngettext ("One file removed", "%d files removed", n), n);

This works because the ‘printf’ function discards excess arguments thatare not consumed by the format string.

If this function is meant to yield a format string that takes two or morearguments, you can not use it like this:

printf (ngettext ("%d file removed from directory %s",
                  "%d files removed from directory %s",
                  n),
        n, dir);

because in many languages the translators want to replace the ‘%d’with an explicit word in the singular case, just like “one” in English,and C format strings cannot consume the second argument but skip the firstargument. Instead, you have to reorder the arguments so that ‘n’comes last:

printf (ngettext ("%2$d file removed from directory %1$s",
                  "%2$d files removed from directory %1$s",
                  n),
        dir, n);

See c-format for details about this argument reordering syntax.

When you know that the value of n is within a given range, you canspecify it as a comment directed to the xgettext tool. Thisinformation may help translators to use more adequate translations. Likethis:

if (days > 7 && days < 14)
  /* xgettext: range: 1..6 */
  printf (ngettext ("one week and one day", "one week and %d days",
                    days - 7),
          days - 7);

It is also possible to use this function when the strings don’t contain acardinal number:

puts (ngettext ("Delete the selected file?",
                "Delete the selected files?",
                n));

In this case the number n is only used to choose the plural form.

Function: char * dngettext (const char *domain, const char *msgid1, const char *msgid2, unsigned long int n)

The dngettext is similar to the dgettext function in theway the message catalog is selected. The difference is that it takestwo extra parameter to provide the correct plural form. These twoparameters are handled in the same way ngettext handles them.

Function: char * dcngettext (const char *domain, const char *msgid1, const char *msgid2, unsigned long int n, int category)

The dcngettext is similar to the dcgettext function in theway the message catalog is selected. The difference is that it takestwo extra parameter to provide the correct plural form. These twoparameters are handled in the same way ngettext handles them.

Now, how do these functions solve the problem of the plural forms?Without the input of linguists (which was not available) it was notpossible to determine whether there are only a few different forms inwhich plural forms are formed or whether the number can increase withevery new supported language.

Therefore the solution implemented is to allow the translator to specifythe rules of how to select the plural form. Since the formula varieswith every language this is the only viable solution except forhardcoding the information in the code (which still would require thepossibility of extensions to not prevent the use of new languages).

The information about the plural form selection has to be stored in theheader entry of the PO file (the one with the empty msgid string).The plural form information looks like this:

Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;

The nplurals value must be a decimal number which specifies howmany different plural forms exist for this language. The stringfollowing plural is an expression which is using the C languagesyntax. Exceptions are that no negative numbers are allowed, numbersmust be decimal, and the only variable allowed is n. Spaces areallowed in the expression, but backslash-newlines are not; in theexamples below the backslash-newlines are present for formatting purposesonly. This expression will be evaluated whenever one of the functionsngettext, dngettext, or dcngettext is called. Thenumeric value passed to these functions is then substituted for all usesof the variable n in the expression. The resulting value thenmust be greater or equal to zero and smaller than the value given as thevalue of nplurals.

The following rules are known at this point. The language with familiesare listed. But this does not necessarily mean the information can begeneralized for the whole family (as can be easily seen in the tablebelow).5

Only one form:

Some languages only require one single form. There is no distinctionbetween the singular and plural form. An appropriate header entrywould look like this:

Plural-Forms: nplurals=1; plural=0;

Languages with this property include:

Asian family

Japanese, Vietnamese, Korean

Tai-Kadai family

Thai

Two forms, singular used for one only

This is the form used in most existing programs since it is what Englishis using. A header entry would look like this:

Plural-Forms: nplurals=2; plural=n != 1;

(Note: this uses the feature of C expressions that boolean expressionshave to value zero or one.)

Languages with this property include:

Germanic family

English, German, Dutch, Swedish, Danish, Norwegian, Faroese

Romanic family

Spanish, Portuguese, Italian, Bulgarian

Latin/Greek family

Greek

Finno-Ugric family

Finnish, Estonian

Semitic family

Hebrew

Austronesian family

Bahasa Indonesian

Artificial

Esperanto

Other languages using the same header entry are:

Finno-Ugric family

Hungarian

Turkic/Altaic family

Turkish

Hungarian does not appear to have a plural if you look at sentences involvingcardinal numbers. For example, “1 apple” is “1 alma”, and “123 apples” is“123 alma”. But when the number is not explicit, the distinction betweensingular and plural exists: “the apple” is “az alma”, and “the apples” is“az almák”. Since ngettext has to support both types of sentences,it is classified here, under “two forms”.

The same holds for Turkish: “1 apple” is “1 elma”, and “123 apples” is“123 elma”. But when the number is omitted, the distinction between singularand plural exists: “the apple” is “elma”, and “the apples” is“elmalar”.

Two forms, singular used for zero and one

Exceptional case in the language family. The header entry would be:

Plural-Forms: nplurals=2; plural=n>1;

Languages with this property include:

Romanic family

Brazilian Portuguese, French

Three forms, special case for zero

The header entry would be:

Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;

Languages with this property include:

Baltic family

Latvian

Three forms, special cases for one and two

The header entry would be:

Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;

Languages with this property include:

Celtic

Gaeilge (Irish)

Three forms, special case for numbers ending in 00 or [2-9][0-9]

The header entry would be:

Plural-Forms: nplurals=3; \
    plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;

Languages with this property include:

Romanic family

Romanian

Three forms, special case for numbers ending in 1[2-9]

The header entry would look like this:

Plural-Forms: nplurals=3; \
    plural=n%10==1 && n%100!=11 ? 0 : \
           n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;

Languages with this property include:

Baltic family

Lithuanian

Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]

The header entry would look like this:

Plural-Forms: nplurals=3; \
    plural=n%10==1 && n%100!=11 ? 0 : \
           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;

Languages with this property include:

Slavic family

Russian, Ukrainian, Belarusian, Serbian, Croatian

Three forms, special cases for 1 and 2, 3, 4

The header entry would look like this:

Plural-Forms: nplurals=3; \
    plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;

Languages with this property include:

Slavic family

Czech, Slovak

Three forms, special case for one and some numbers ending in 2, 3, or 4

The header entry would look like this:

Plural-Forms: nplurals=3; \
    plural=n==1 ? 0 : \
           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;

Languages with this property include:

Slavic family

Polish

Four forms, special case for one and all numbers ending in 02, 03, or 04

The header entry would look like this:

Plural-Forms: nplurals=4; \
    plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;

Languages with this property include:

Slavic family

Slovenian

Six forms, special cases for one, two, all numbers ending in 02, 03, … 10, all numbers ending in 11 … 99, and others

The header entry would look like this:

Plural-Forms: nplurals=6; \
    plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \
    : n%100>=11 ? 4 : 5;

Languages with this property include:

Afroasiatic family

Arabic

You might now ask, ngettext handles only numbers n of type‘unsigned long’. What about larger integer types? What about negativenumbers? What about floating-point numbers?

About larger integer types, such as ‘uintmax_t’ or ‘unsigned long long’: they can be handled by reducing the value to arange that fits in an ‘unsigned long’. Simply casting the value to‘unsigned long’ would not do the right thing, since it would treatULONG_MAX + 1 like zero, ULONG_MAX + 2 like singular, andthe like. Here you can exploit the fact that all mentioned plural formformulas eventually become periodic, with a period that is a divisor of 100(or 1000 or 1000000). So, when you reduce a large value to another one inthe range [1000000, 1999999] that ends in the same 6 decimal digits, youcan assume that it will lead to the same plural form selection. This codedoes this:

#include <inttypes.h>
uintmax_t nbytes = ...;
printf (ngettext ("The file has %"PRIuMAX" byte.",
                  "The file has %"PRIuMAX" bytes.",
                  (nbytes > ULONG_MAX
                   ? (nbytes % 1000000) + 1000000
                   : nbytes)),
        nbytes);

Negative and floating-point values usually represent physical entities forwhich singular and plural don’t clearly apply. In such cases, there is noneed to use ngettext; a simple gettext call with a form suitablefor all values will do. For example:

printf (gettext ("Time elapsed: %.3f seconds"),
        num_milliseconds * 0.001);

Even if num_milliseconds happens to be a multiple of 1000, the output

Time elapsed: 1.000 seconds

is acceptable in English, and similarly for other languages.

The translators’ perspective regarding plural forms is explained inTranslating plural forms.


Previous: Plural forms, Up: gettext   [Contents][Index]

11.2.7 Optimization of the *gettext functions

At this point of the discussion we should talk about an advantage of theGNU gettext implementation. Some readers might have pointed outthat an internationalized program might have a poor performance if somestring has to be translated in an inner loop. While this is unavoidablewhen the string varies from one run of the loop to the other it issimply a waste of time when the string is always the same. Take thefollowing example:

{
  while (…)
    {
      puts (gettext ("Hello world"));
    }
}

When the locale selection does not change between two runs the resultingstring is always the same. One way to use this is:

{
  str = gettext ("Hello world");
  while (…)
    {
      puts (str);
    }
}

But this solution is not usable in all situation (e.g. when the localeselection changes) nor does it lead to legible code.

For this reason, GNU gettext caches previous translation results.When the same translation is requested twice, with no new messagecatalogs being loaded in between, gettext will, the second time,find the result through a single cache lookup.


11.3 Comparing the Two Interfaces

The following discussion is perhaps a little bit colored. As saidabove we implemented GNU gettext following the Uniforumproposal and this surely has its reasons. But it should show how wecame to this decision.

First we take a look at the developing process. When we write anapplication using NLS provided by gettext we proceed as always.Only when we come to a string which might be seen by the users and thushas to be translated we use gettext("…") instead of"…". At the beginning of each source file (or in a centralheader file) we define

#define gettext(String) (String)

Even this definition can be avoided when the system supports thegettext function in its C library. When we compile this code theresult is the same as if no NLS code is used. When you take a look atthe GNU gettext code you will see that we use _("…")instead of gettext("…"). This reduces the number ofadditional characters per translatable string to 3 (in words:three).

When now a production version of the program is needed we simply replacethe definition

#define _(String) (String)

by

#include <libintl.h>
#define _(String) gettext (String)

Additionally we run the program xgettext on all source code filewhich contain translatable strings and that’s it: we have a runningprogram which does not depend on translations to be available, but whichcan use any that becomes available.

The same procedure can be done for the gettext_noop invocations(see Special cases). One usually defines gettext_noop as ano-op macro. So you should consider the following code for your project:

#define gettext_noop(String) String
#define N_(String) gettext_noop (String)

N_ is a short form similar to _. The Makefile inthe po/ directory of GNU gettext knows by default both of thementioned short forms so you are invited to follow this proposal foryour own ease.

Now to catgets. The main problem is the work for theprogrammer. Every time he comes to a translatable string he has todefine a number (or a symbolic constant) which has also be defined inthe message catalog file. He also has to take care for duplicateentries, duplicate message IDs etc. If he wants to have the samequality in the message catalog as the GNU gettext programprovides he also has to put the descriptive comments for the strings andthe location in all source code files in the message catalog. This isnearly a Mission: Impossible.

But there are also some points people might call advantages speaking forcatgets. If you have a single word in a string and this stringis used in different contexts it is likely that in one or the otherlanguage the word has different translations. Example:

printf ("%s: %d", gettext ("number"), number_of_errors)

printf ("you should see %d %s", number_count,
        number_count == 1 ? gettext ("number") : gettext ("numbers"))

Here we have to translate two times the string "number". Evenif you do not speak a language beside English it might be possible torecognize that the two words have a different meaning. In German thefirst appearance has to be translated to "Anzahl" and the secondto "Zahl".

Now you can say that this example is really esoteric. And you areright! This is exactly how we felt about this problem and decide thatit does not weight that much. The solution for the above problem couldbe very easy:

printf ("%s %d", gettext ("number:"), number_of_errors)

printf (number_count == 1 ? gettext ("you should see %d number")
                          : gettext ("you should see %d numbers"),
        number_count)

We believe that we can solve all conflicts with this method. If it isdifficult one can also consider changing one of the conflicting string alittle bit. But it is not impossible to overcome.

catgets allows same original entry to have different translations,but gettext has another, scalable approach for solving ambiguitiesof this kind: See Ambiguities.


11.4 Using libintl.a in own programs

Starting with version 0.9.4 the library libintl.h should beself-contained. I.e., you can use it in your own programs withoutproviding additional functions. The Makefile will put the headerand the library in directories selected using the $(prefix).


11.5 Being a gettext grok

NOTE: This documentation section is outdated and needs to berevised.

To fully exploit the functionality of the GNU gettext library itis surely helpful to read the source code. But for those who don’t wantto spend that much time in reading the (sometimes complicated) code hereis a list comments:

  • Changing the language at runtime

    For interactive programs it might be useful to offer a selection of theused language at runtime. To understand how to do this one need to knowhow the used language is determined while executing the gettextfunction. The method which is presented here only works correctlywith the GNU implementation of the gettext functions.

    In the function dcgettext at every call the current setting ofthe highest priority environment variable is determined and used.Highest priority means here the following list with decreasingpriority:

    1. LANGUAGE
    2. LC_ALL
    3. LC_xxx, according to selected locale category
    4. LANG

    Afterwards the path is constructed using the found value and thetranslation file is loaded if available.

    What happens now when the value for, say, LANGUAGE changes? Accordingto the process explained above the new value of this variable is foundas soon as the dcgettext function is called. But this also meansthe (perhaps) different message catalog file is loaded. In otherwords: the used language is changed.

    But there is one little hook. The code for gcc-2.7.0 and up providessome optimization. This optimization normally prevents the calling ofthe dcgettext function as long as no new catalog is loaded. Butif dcgettext is not called the program also cannot find theLANGUAGE variable be changed (see Optimized gettext). Asolution for this is very easy. Include the following code in thelanguage switching function.

      /* Change language.  */
      setenv ("LANGUAGE", "fr", 1);
    
      /* Make change known.  */
      {
        extern int  _nl_msg_cat_cntr;
        ++_nl_msg_cat_cntr;
      }
    

    The variable _nl_msg_cat_cntr is defined in loadmsgcat.c.You don’t need to know what this is for. But it can be used to detectwhether a gettext implementation is GNU gettext and not non-GNUsystem’s native gettext implementation.


11.6 Temporary Notes for the Programmers Chapter

NOTE: This documentation section is outdated and needs to berevised.


11.6.1 Temporary - Two Possible Implementations

There are two competing methods for language independent messages:the X/Open catgets method, and the Uniforum gettextmethod. The catgets method indexes messages by integers; thegettext method indexes them by their English translations.The catgets method has been around longer and is supportedby more vendors. The gettext method is supported by Sun,and it has been heard that the COSE multi-vendor initiative issupporting it. Neither method is a POSIX standard; the POSIX.1committee had a lot of disagreement in this area.

Neither one is in the POSIX standard. There was much disagreementin the POSIX.1 committee about using the gettext routinesvs. catgets (XPG). In the end the committee couldn’tagree on anything, so no messaging system was included as partof the standard. I believe the informative annex of the standardincludes the XPG3 messaging interfaces, “…as an example ofa messaging system that has been implemented…”

They were very careful not to say anywhere that you should use oneset of interfaces over the other. For more on this topic pleasesee the Programming for Internationalization FAQ.


11.6.2 Temporary - About catgets

There have been a few discussions of late on the use ofcatgets as a base. I think it important to present bothsides of the argument and hence am opting to play devil’s advocatefor a little bit.

I’ll not deny the fact that catgets could have been designeda lot better. It currently has quite a number of limitations andthese have already been pointed out.

However there is a great deal to be said for consistency andstandardization. A common recurring problem when writing Unixsoftware is the myriad portability problems across Unix platforms.It seems as if every Unix vendor had a look at the operating systemand found parts they could improve upon. Undoubtedly, thesemodifications are probably innovative and solve real problems.However, software developers have a hard time keeping up with allthese changes across so many platforms.

And this has prompted the Unix vendors to begin to standardize theirsystems. Hence the impetus for Spec1170. Every major Unix vendorhas committed to supporting this standard and every Unix softwaredeveloper waits with glee the day they can write software to thisstandard and simply recompile (without having to use autoconf)across different platforms.

As I understand it, Spec1170 is roughly based upon version 4 of theX/Open Portability Guidelines (XPG4). Because catgets andfriends are defined in XPG4, I’m led to believe that catgetsis a part of Spec1170 and hence will become a standardized componentof all Unix systems.


11.6.3 Temporary - Why a single implementation

Now it seems kind of wasteful to me to have two different systemsinstalled for accessing message catalogs. If we do want to remedycatgets deficiencies why don’t we try to expand catgets(in a compatible manner) rather than implement an entirely new system.Otherwise, we’ll end up with two message catalog access systems installedwith an operating system - one set of routines for packages using GNUgettext for their internationalization, and another set of routines(catgets) for all other software. Bloated?

Supposing another catalog access system is implemented. Which dowe recommend? At least for Linux, we need to attract as manysoftware developers as possible. Hence we need to make it as easyfor them to port their software as possible. Which means supportingcatgets. We will be implementing the libintl codewithin our libc, but does this mean we also have to incorporateanother message catalog access scheme within our libc as well?And what about people who are going to be using the libintl+ non-catgets routines. When they port their software toother platforms, they’re now going to have to include the front-end(libintl) code plus the back-end code (the non-catgetsaccess routines) with their software instead of just including thelibintl code with their software.

Message catalog support is however only the tip of the iceberg.What about the data for the other locale categories? They also havea number of deficiencies. Are we going to abandon them as well anddevelop another duplicate set of routines (should libintlexpand beyond message catalog support)?

Like many parts of Unix that can be improved upon, we’re stuck with balancingcompatibility with the past with useful improvements and innovations forthe future.


11.6.4 Temporary - Notes

X/Open agreed very late on the standard form so that manyimplementations differ from the final form. Both of my system (oldLinux catgets and Ultrix-4) have a strange variation.

OK. After incorporating the last changes I have to spend some time onmaking the GNU/Linux libc gettext functions. So in futureSolaris is not the only system having gettext.


Next: Maintainers, Previous: Programmers, Up: Top   [Contents][Index]

12 The Translator’s View


12.1 Introduction 0

NOTE: This documentation section is outdated and needs to berevised.

Free software is going international! The Translation Project is a wayto get maintainers, translators and users all together, so free softwarewill gradually become able to speak many native languages.

The GNU gettext tool set contains everything maintainersneed for internationalizing their packages for messages. It alsocontains quite useful tools for helping translators at localizingmessages to their native language, once a package has already beeninternationalized.

To achieve the Translation Project, we need many interestedpeople who like their own language and write it well, and who are alsoable to synergize with other translators speaking the same language.If you’d like to volunteer to work at translating messages,please send mail to your translating team.

Each team has its own mailing list, courtesy of LinuxInternational. You may reach your translating team at the addressll@li.org, replacing ll by the two-letter ISO 639code for your language. Language codes are not the same ascountry codes given in ISO 3166. The following translating teamsexist:

Chinese zh, Czech cs, Danish da, Dutch nl,Esperanto eo, Finnish fi, French fr, Irishga, German de, Greek el, Italian it,Japanese ja, Indonesian in, Norwegian no, Polishpl, Portuguese pt, Russian ru, Spanish es,Swedish sv and Turkish tr.

For example, you may reach the Chinese translating team by writing to[email protected]. When you become a member of the translating teamfor your own language, you may subscribe to its list. For example,Swedish people can send a message to sv-request@li.org,having this message body:

subscribe

Keep in mind that team members should be interested in workingat translations, or at solving translational difficulties, rather thanmerely lurking around. If your team does not exist yet and you want tostart one, please write to [email protected];you will then reach the coordinator for all translator teams.

A handful of GNU packages have already been adapted and providedwith message translations for several languages. Translationteams have begun to organize, using these packages as a startingpoint. But there are many more packages and many languages forwhich we have no volunteer translators. If you would like tovolunteer to work at translating messages, please send mail to[email protected] indicating what language(s)you can work on.


12.2 Introduction 1

NOTE: This documentation section is outdated and needs to berevised.

This is now official, GNU is going international! Here is theannouncement submitted for the January 1995 GNU Bulletin:

A handful of GNU packages have already been adapted and providedwith message translations for several languages. Translationteams have begun to organize, using these packages as a startingpoint. But there are many more packages and many languagesfor which we have no volunteer translators. If you’d like tovolunteer to work at translating messages, please send mail to‘[email protected]’ indicating what language(s)you can work on.

This document should answer many questions for those who are curious aboutthe process or would like to contribute. Please at least skim over it,hoping to cut down a little of the high volume of e-mail generated by thiscollective effort towards internationalization of free software.

Most free programming which is widely shared is done in English, andcurrently, English is used as the main communicating language betweennational communities collaborating to free software. This very documentis written in English. This will not change in the foreseeable future.

However, there is a strong appetite from national communities forhaving more software able to write using national language and habits,and there is an on-going effort to modify free software in such a waythat it becomes able to do so. The experiments driven so far raisedan enthusiastic response from pretesters, so we believe thatinternationalization of free software is dedicated to succeed.

For suggestion clarifications, additions or corrections to thisdocument, please e-mail to [email protected].


12.3 Discussions

NOTE: This documentation section is outdated and needs to berevised.

Facing this internationalization effort, a few users expressed theirconcerns. Some of these doubts are presented and discussed, here.

  • Smaller groups

    Some languages are not spoken by a very large number of people, so peoplespeaking them sometimes consider that there may not be all that muchdemand such versions of free software packages. Moreover, many peoplebeing into computers, in some countries, generally seem to preferEnglish versions of their software.

    On the other end, people might enjoy their own language a lot, and bevery motivated at providing to themselves the pleasure of having theirbeloved free software speaking their mother tongue. They do themselvesa personal favor, and do not pay that much attention to the number ofpeople benefiting of their work.

  • Misinterpretation

    Other users are shy to push forward their own language, seeing in thissome kind of misplaced propaganda. Someone thought there must be someusers of the language over the networks pestering other people with it.

    But any spoken language is worth localization, because there arepeople behind the language for whom the language is important anddear to their hearts.

  • Odd translations

    The biggest problem is to find the right translations so thateverybody can understand the messages. Translations are usually alittle odd. Some people get used to English, to the extent they mayfind translations into their own language “rather pushy, obnoxiousand sometimes even hilarious.” As a French speaking man, I havethe experience of those instruction manuals for goods, so poorlytranslated in French in Korea or Taiwan…

    The fact is that we sometimes have to create a kind of nationalcomputer culture, and this is not easy without the collaboration ofmany people liking their mother tongue. This is why translations arebetter achieved by people knowing and loving their own language, andready to work together at improving the results they obtain.

  • Dependencies over the GPL or LGPL

    Some people wonder if using GNU gettext necessarily brings theirpackage under the protective wing of the GNU General Public License orthe GNU Lesser General Public License, when they do not want to maketheir program free, or want other kinds of freedom. The simplestanswer is “normally not”.

    The gettext-runtime part of GNU gettext, i.e. thecontents of libintl, is covered by the GNU Lesser General PublicLicense. The gettext-tools part of GNU gettext, i.e. therest of the GNU gettext package, is covered by the GNU GeneralPublic License.

    The mere marking of localizable strings in a package, or conditionalinclusion of a few lines for initialization, is not really includingGPL’ed or LGPL’ed code. However, since the localization routines inlibintl are under the LGPL, the LGPL needs to be considered.It gives the right to distribute the complete unmodified source oflibintl even with non-free programs. It also gives the rightto use libintl as a shared library, even for non-free programs.But it gives the right to use libintl as a static library orto incorporate libintl into another library only to freesoftware.


12.4 Organization

NOTE: This documentation section is outdated and needs to berevised.

On a larger scale, the true solution would be to organize some kind offairly precise set up in which volunteers could participate. I gavesome thought to this idea lately, and realize there will be sometouchy points. I thought of writing to Richard Stallman to launchsuch a project, but feel it might be good to shake out the ideasbetween ourselves first. Most probably that Linux International hassome experience in the field already, or would like to orchestratethe volunteer work, maybe. Food for thought, in any case!

I guess we have to setup something early, somehow, that will helpmany possible contributors of the same language to interlock and avoidwork duplication, and further be put in contact for solving togetherproblems particular to their tongue (in most languages, there are manydifficulties peculiar to translating technical English). My Swedishcontributor acknowledged these difficulties, and I’m well aware ofthem for French.

This is surely not a technical issue, but we should manage so theeffort of locale contributors be maximally useful, despite the nationalteam layer interface between contributors and maintainers.

The Translation Project needs some setup for coordinating languagecoordinators. Localizing evolving programs will surelybecome a permanent and continuous activity in the free software community,once well started.The setup should be minimally completed and tested before GNUgettext becomes an official reality. The e-mail address[email protected] has been set up for receivingoffers from volunteers and general e-mail on these topics. This addressreaches the Translation Project coordinator.


12.4.1 Central Coordination

I also think GNU will need sooner than it thinks, that someone set upa way to organize and coordinate these groups. Some kind of groupof groups. My opinion is that it would be good that GNU delegatesthis task to a small group of collaborating volunteers, shortly.Perhaps in gnu.announce a list of this national committee’scan be published.

My role as coordinator would simply be to refer to Ulrich any Germanspeaking volunteer interested to localization of free software packages, andmaybe helping national groups to initially organize, while maintainingnational registries for until national groups are ready to take over.In fact, the coordinator should ease volunteers to get in contact withone another for creating national teams, which should then selectone coordinator per language, or country (regionalized language).If well done, the coordination should be useful without being anoverwhelming task, the time to put delegations in place.


12.4.2 National Teams

I suggest we look for volunteer coordinators/editors for individuallanguages. These people will scan contributions of translation filesfor various programs, for their own languages, and will ensure highand uniform standards of diction.

From my current experience with other people in these days, those whoprovide localizations are very enthusiastic about the process, and aremore interested in the localization process than in the program theylocalize, and want to do many programs, not just one. This seemsto confirm that having a coordinator/editor for each language is agood idea.

We need to choose someone who is good at writing clear and conciseprose in the language in question. That is hard—we can’t checkit ourselves. So we need to ask a few people to judge each others’writing and select the one who is best.

I announce my prerelease to a few dozen people, and you would notbelieve all the discussions it generated already. I shudder to thinkwhat will happen when this will be launched, for true, officially,world wide. Who am I to arbitrate between two Czekolsovak userscontradicting each other, for example?

I assume that your German is not much better than my French so thatI would not be able to judge about these formulations. What I wouldsuggest is that for each language there is a group for people whomaintain the PO files and judge about changes. I suspect there willbe cultural differences between how such groups of people will behave.Some will have relaxed ways, reach consensus easily, and have anyoneof the group relate to the maintainers, while others will fight todeath, organize heavy administrations up to national standards, anduse strict channels.

The German team is putting out a good example. Right now, they aremaybe half a dozen people revising translations of each other anddiscussing the linguistic issues. I do not even have all the names.Ulrich Drepper is taking care of coordinating the German team.He subscribed to all my pretest lists, so I do not even have to warnhim specifically of incoming releases.

I’m sure, that is a good idea to get teams for each language workingon translations. That will make the translations better and moreconsistent.


12.4.2.1 Sub-Cultures

Taking French for example, there are a few sub-cultures around computerswhich developed diverging vocabularies. Picking volunteers here andthere without addressing this problem in an organized way, soon in theproject, might produce a distasteful mix of internationalized programs,and possibly trigger endless quarrels among those who really care.

Keeping some kind of unity in the way French localization ofinternationalized programs is achieved is a difficult (and delicate) job.Knowing the latin character of French people (:-), if we take thisthe wrong way, we could end up nowhere, or spoil a lot of energies.Maybe we should begin to address this problem seriously beforeGNU gettext become officially published. And I suspect that thismeans soon!


12.4.2.2 Organizational Ideas

I expect the next big changes after the official release. Please notethat I use the German translation of the short GPL message. We needto set a few good examples before the localization goes out for truein the free software community. Here are a few points to discuss:

  • Each group should have one FTP server (at least one master).
  • The files on the server should reflect the latest version (ofcourse!) and it should also contain a RCS directory with thecorresponding archives (I don’t have this now).
  • There should also be a ChangeLog file (this is more useful than theRCS archive but can be generated automatically from the later byEmacs).
  • A core group should judge about questionable changes (for nowthis group consists solely by me but I ask some others occasionally;this also seems to work).

12.4.3 Mailing Lists

If we get any inquiries about GNU gettext, send them on to:

The *-pretest lists are quite useful to me, maybe the idea couldbe generalized to many GNU, and non-GNU packages. But each maintainerhis/her way!

François, we have a mechanism in place here atgnu.ai.mit.edu to track teams, support mailing lists forthem and log members. We have a slight preference that you use it.If this is OK with you, I can get you clued in.

Things are changing! A few years ago, when Daniel Fekete and Iasked for a mailing list for GNU localization, nested at the FSF, wewere politely invited to organize it anywhere else, and so did we.For communicating with my pretesters, I later made a handful ofmailing lists located at iro.umontreal.ca and administrated bymajordomo. These lists have been very dependableso far…

I suspect that the German team will organize itself a mailing listlocated in Germany, and so forth for other countries. But before theyorganize for true, it could surely be useful to offer mailing listslocated at the FSF to each national team. So yes, please explain mehow I should proceed to create and handle them.

We should create temporary mailing lists, one per country, to helppeople organize. Temporary, because once regrouped and structured, itwould be fair the volunteers from country bring back their listin there and manage it as they want. My feeling is that, in the longrun, each team should run its own list, from within their country.There also should be some central list to which all teams couldsubscribe as they see fit, as long as each team is represented in it.


12.5 Information Flow

NOTE: This documentation section is outdated and needs to berevised.

There will surely be some discussion about this messages after thepackages are finally released. If people now send you some proposalsfor better messages, how do you proceed? Jim, please note thatright now, as I put forward nearly a dozen of localizable programs, Ireceive both the translations and the coordination concerns about them.

If I put one of my things to pretest, Ulrich receives the announcementand passes it on to the German team, who make last minute revisions.Then he submits the translation files to me as the maintainer.For free packages I do not maintain, I would not even hear about it.This scheme could be made to work for the whole Translation Project,I think. For security reasons, maybe Ulrich (national coordinators,in fact) should update central registry kept at the Translation Project(Jim, me, or Len’s recruits) once in a while.

In December/January, I was aggressively ready to internationalizeall of GNU, giving myself the duty of one small GNU package per weekor so, taking many weeks or months for bigger packages. But it doesnot work this way. I first did all the things I’m responsible for.I’ve nothing against some missionary work on other maintainers, butI’m also losing a lot of energy over it—same debates over again.

And when the first localized packages are released we’ll get a lot ofresponses about ugly translations :-). Surely, and we need to havebeforehand a fairly good idea about how to handle the informationflow between the national teams and the package maintainers.

Please start saving somewhere a quick history of each PO file. I knowfor sure that the file format will change, allowing for comments.It would be nice that each file has a kind of log, and references forthose who want to submit comments or gripes, or otherwise contribute.I sent a proposal for a fast and flexible format, but it is notreceiving acceptance yet by the GNU deciders. I’ll tell you when Ihave more information about this.


12.6 Translating plural forms

Suppose you are translating a PO file, and it contains an entry like this:

#, c-format
msgid "One file removed"
msgid_plural "%d files removed"
msgstr[0] ""
msgstr[1] ""

What does this mean? How do you fill it in?

Such an entry denotes a message with plural forms, that is, a message wherethe text depends on a cardinal number. The general form of the message,in English, is the msgid_plural line. The msgid line is theEnglish singular form, that is, the form for when the number is equal to 1.More details about plural forms are explained in Plural forms.

The first thing you need to look at is the Plural-Forms line in theheader entry of the PO file. It contains the number of plural forms and aformula. If the PO file does not yet have such a line, you have to add it.It only depends on the language into which you are translating. You canget this info by using the msginit command (see Creating) –it contains a database of known plural formulas – or by asking othermembers of your translation team.

Suppose the line looks as follows:

"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n"
"%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

It’s logically one line; recall that the PO file formatting is allowed tobreak long lines so that each physical line fits in 80 monospaced columns.

The value of nplurals here tells you that there are three pluralforms. The first thing you need to do is to ensure that the entry containsan msgstr line for each of the forms:

#, c-format
msgid "One file removed"
msgid_plural "%d files removed"
msgstr[0] ""
msgstr[1] ""
msgstr[2] ""

Then translate the msgid_plural line and fill it in into eachmsgstr line:

#, c-format
msgid "One file removed"
msgid_plural "%d files removed"
msgstr[0] "%d slika uklonjenih"
msgstr[1] "%d slika uklonjenih"
msgstr[2] "%d slika uklonjenih"

Now you can refine the translation so that it matches the plural form.According to the formula above, msgstr[0] is used when the numberends in 1 but does not end in 11; msgstr[1] is used when the numberends in 2, 3, 4, but not in 12, 13, 14; and msgstr[2] is used inall other cases. With this knowledge, you can refine the translations:

#, c-format
msgid "One file removed"
msgid_plural "%d files removed"
msgstr[0] "%d slika je uklonjena"
msgstr[1] "%d datoteke uklonjenih"
msgstr[2] "%d slika uklonjenih"

You noticed that in the English singular form (msgid) the numberplaceholder could be omitted and replaced by the numeral word “one”.Can you do this in your translation as well?

msgstr[0] "jednom datotekom je uklonjen"

Well, it depends on whether msgstr[0] applies only to the number 1,or to other numbers as well. If, according to the plural formula,msgstr[0] applies only to n == 1, then you can use thespecialized translation without the number placeholder. In our case,however, msgstr[0] also applies to the numbers 21, 31, 41, etc.,and therefore you cannot omit the placeholder.


12.7 Prioritizing messages: How to determine which messages to translate first

A translator sometimes has only a limited amount of time per week tospend on a package, and some packages have quite large message catalogs(over 1000 messages). Therefore she wishes to translate the messagesfirst that are the most visible to the user, or that occur most frequently.This section describes how to determine these "most urgent" messages.It also applies to determine the "next most urgent" messages after themessage catalog has already been partially translated.

In a first step, she uses the programs like a user would do. While shedoes this, the GNU gettext library logs into a file the not yettranslated messages for which a translation was requested from the program.

In a second step, she uses the PO mode to translate precisely this setof messages.

Here a more details. The GNU libintl library (but not thecorresponding functions in GNU libc) supports an environment variableGETTEXT_LOG_UNTRANSLATED. The GNU libintl library willlog into this file the messages for which gettext() and relatedfunctions couldn’t find the translation. If the file doesn’t exist, itwill be created as needed. On systems with GNU libc a shared library‘preloadable_libintl.so’ is provided that can be used with the ELF‘LD_PRELOAD’ mechanism.

So, in the first step, the translator uses these commands on systems withGNU libc:

$ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so
$ export LD_PRELOAD
$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
$ export GETTEXT_LOG_UNTRANSLATED

and these commands on other systems:

$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused
$ export GETTEXT_LOG_UNTRANSLATED

Then she uses and peruses the programs. (It is a good and recommendedpractice to use the programs for which you provide translations: itgives you the needed context.) When done, she removes the environmentvariables:

$ unset LD_PRELOAD
$ unset GETTEXT_LOG_UNTRANSLATED

The second step starts with removing duplicates:

$ msguniq $HOME/gettextlogused > missing.po

The result is a PO file, but needs some preprocessing before a PO file editorcan be used with it. First, it is a multi-domain PO file, containingmessages from many translation domains. Second, it lacks all translatorcomments and source references. Here is how to get a list of the affectedtranslation domains:

$ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq

Then the translator can handle the domains one by one. For simplicity,let’s use environment variables to denote the language, domain and sourcepackage.

$ lang=nl             # your language
$ domain=coreutils    # the name of the domain to be handled
$ package=/usr/src/gnu/coreutils-4.5.4   # the package where it comes from

She takes the latest copy of $lang.po from the Translation Project,or from the package (in most cases, $package/po/$lang.po), orcreates a fresh one if she’s the first translator (see Creating).She then uses the following commands to mark the not urgent messages as"obsolete". (This doesn’t mean that these messages - translated anduntranslated ones - will go away. It simply means that the PO file editorwill ignore them in the following editing session.)

$ msggrep --domain=$domain missing.po | grep -v '^domain' \
  > $domain-missing.po
$ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \
  > $domain.$lang-urgent.po

The she translates $domain.$lang-urgent.po by use of a PO file editor(see Editing).(FIXME: I don’t know whether KBabel and gtranslator alsopreserve obsolete messages, as they should.)Finally she restores the not urgent messages (with their earliertranslations, for those which were already translated) through this command:

$ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \
  > $domain.$lang.po

Then she can submit $domain.$lang.po and proceed to the next domain.


Next: Installers, Previous: Translators, Up: Top   [Contents][Index]

13 The Maintainer’s View

The maintainer of a package has many responsibilities. One of themis ensuring that the package will install easily on many platforms,and that the magic we described earlier (see Users) will workfor installers and end users.

Of course, there are many possible ways by which GNU gettextmight be integrated in a distribution, and this chapter does not coverthem in all generality. Instead, it details one possible approach whichis especially adequate for many free software distributions following GNUstandards, or even better, Gnits standards, because GNU gettextis purposely for helping the internationalization of the whole GNUproject, and as many other good free packages as possible. So, themaintainer’s view presented here presumes that the package already hasa configure.ac file and uses GNU Autoconf.

Nevertheless, GNU gettext may surely be useful for free packagesnot following GNU standards and conventions, but the maintainers of suchpackages might have to show imagination and initiative in organizingtheir distributions so gettext work for them in all situations.There are surely many, out there.

Even if gettext methods are now stabilizing, slight adjustmentsmight be needed between successive gettext versions, so youshould ideally revise this chapter in subsequent releases, lookingfor changes.


13.1 Flat or Non-Flat Directory Structures

Some free software packages are distributed as tar files which unpackin a single directory, these are said to be flat distributions.Other free software packages have a one level hierarchy of subdirectories, usingfor example a subdirectory named doc/ for the Texinfo manual andman pages, another called lib/ for holding functions meant toreplace or complement C libraries, and a subdirectory src/ forholding the proper sources for the package. These other distributionsare said to be non-flat.

We cannot say much about flat distributions. A flatdirectory structure has the disadvantage of increasing the difficultyof updating to a new version of GNU gettext. Also, if you havemany PO files, this could somewhat pollute your single directory.Also, GNU gettext’s libintl sources consist of C sources, shellscripts, sed scripts and complicated Makefile rules, which don’tfit well into an existing flat structure. For these reasons, werecommend to use non-flat approach in this case as well.

Maybe because GNU gettext itself has a non-flat structure,we have more experience with this approach, and this is what will bedescribed in the remaining of this chapter. Some maintainers mightuse this as an opportunity to unflatten their package structure.


13.2 Prerequisite Works

There are some works which are required for using GNU gettextin one of your package. These works have some kind of generalitythat escape the point by point descriptions used in the remainderof this chapter. So, we describe them here.

  • Before attempting to use gettextize you should install someother packages first.Ensure that recent versions of GNU m4, GNU Autoconf and GNUgettext are already installed at your site, and if not, proceedto do this first. If you get to install these things, beware thatGNU m4 must be fully installed before GNU Autoconf is evenconfigured.

    To further ease the task of a package maintainer the automakepackage was designed and implemented. GNU gettext now uses thistool and the Makefiles in the intl/ and po/therefore know about all the goals necessary for using automakeand libintl in one project.

    Those four packages are only needed by you, as a maintainer; theinstallers of your own package and end users do not really need any ofGNU m4, GNU Autoconf, GNU gettext, or GNU automakefor successfully installing and running your package, with messagesproperly translated. But this is not completely true if you provideinternationalized shell scripts within your own package: GNUgettext shall then be installed at the user site if the end userswant to see the translation of shell script messages.

  • Your package should use Autoconf and have a configure.ac orconfigure.in file.If it does not, you have to learn how. The Autoconf documentationis quite well written, it is a good idea that you print it and getfamiliar with it.
  • Your C sources should have already been modified according toinstructions given earlier in this manual. See Sources.
  • Your po/ directory should receive all PO files submitted to youby the translator teams, each having ll.po as a name.This is not usually easy to get translationwork done before your package gets internationalized and available!Since the cycle has to start somewhere, the easiest for the maintaineris to start with absolutely no PO files, and wait until varioustranslator teams get interested in your package, and submit PO files.

It is worth adding here a few words about how the maintainer shouldideally behave with PO files submissions. As a maintainer, your role isto authenticate the origin of the submission as being the representativeof the appropriate translating teams of the Translation Project (forwardthe submission to [email protected] in case of doubt),to ensure that the PO file format is not severely broken and does notprevent successful installation, and for the rest, to merely put thesePO files in po/ for distribution.

As a maintainer, you do not have to take on your shoulders theresponsibility of checking if the translations are adequate orcomplete, and should avoid diving into linguistic matters. Translationteams drive themselves and are fully responsible of their linguisticchoices for the Translation Project. Keep in mind that translator teams are notdriven by maintainers. You can help by carefully redirecting allcommunications and reports from users about linguistic matters to theappropriate translation team, or explain users how to reach or jointheir team. The simplest might be to send them the ABOUT-NLS file.

Maintainers should never ever apply PO file bug reportsthemselves, short-cutting translation teams. If some translator hasdifficulty to get some of her points through her team, it should not bean option for her to directly negotiate translations with maintainers.Teams ought to settle their problems themselves, if any. If you, asa maintainer, ever think there is a real problem with a team, pleasenever try to solve a team’s problem on your own.


13.3 Invoking the gettextize Program

The gettextize program is an interactive tool that helps themaintainer of a package internationalized through GNU gettext.It is used for two purposes:

  • As a wizard, when a package is modified to use GNU gettext forthe first time.
  • As a migration tool, for upgrading the GNU gettext support ina package from a previous to a newer version of GNU gettext.

This program performs the following tasks:

  • It copies into the package some files that are consistently andidentically needed in every package internationalized throughGNU gettext.
  • It performs as many of the tasks mentioned in the next sectionAdjusting Files as can be performed automatically.
  • It removes obsolete files and idioms used for previous GNUgettext versions to the form recommended for the current GNUgettext version.
  • It prints a summary of the tasks that ought to be done manuallyand could not be done automatically by gettextize.

It can be invoked as follows:

gettextize [ option… ] [ directory ]

and accepts the following options:

-f’ ‘ --force

Force replacement of files which already exist.

--intl

Install the libintl sources in a subdirectory named intl/.This libintl will be used to provide internationalization on systemsthat don’t have GNU libintl installed. If this option is omitted,the call to AM_GNU_GETTEXT in configure.ac should read:‘AM_GNU_GETTEXT([external])’, and internationalization will notbe enabled on systems lacking GNU gettext.

--po-dir=dir

Specify a directory containing PO files. Such a directory contains thetranslations into various languages of a particular POT file. Thisoption can be specified multiple times, once for each translation domain.If it is not specified, the directory named po/ is updated.

--no-changelog

Don’t update or create ChangeLog files. By default, gettextizelogs all changes (file additions, modifications and removals) in afile called ‘ChangeLog’ in each affected directory.

--symlink

Make symbolic links instead of copying the needed files. This can beuseful to save a few kilobytes of disk space, but it requires extraeffort to create self-contained tarballs, it may disturb some mechanismthe maintainer applies to the sources, and it is likely to introducebugs when a newer version of gettext is installed on the system.

-n’ ‘ --dry-run

Print modifications but don’t perform them. All actions thatgettextize would normally execute are inhibited and instead onlylisted on standard output.

--help

Display this help and exit.

--version

Output version information and exit.

If directory is given, this is the top level directory of apackage to prepare for using GNU gettext. If not given, itis assumed that the current directory is the top level directory ofsuch a package.

The program gettextize provides the following files. However,no existing file will be replaced unless the option --force(-f) is specified.

  1. The ABOUT-NLS file is copied in the main directory of your package,the one being at the top level. This file gives the main indicationsabout how to install and use the Native Language Support featuresof your program. You might elect to use a more recent copy of thisABOUT-NLS file than the one provided through gettextize,if you have one handy. You may also fetch a more recent copy of fileABOUT-NLS from Translation Project sites, and from most GNUarchive sites.
  2. A po/ directory is created for eventually holdingall translation files, but initially only containing the filepo/Makefile.in.in from the GNU gettext distribution(beware the double ‘.in’ in the file name) and a few auxiliaryfiles. If the po/ directory already exists, it will be preservedalong with the files it contains, and only Makefile.in.in andthe auxiliary files will be overwritten.

    If ‘--po-dir’ has been specified, this holds for every directoryspecified through ‘--po-dir’, instead of po/.

  3. Only if ‘--intl’ has been specified:A intl/ directory is created and filled with most of the filesoriginally in the intl/ directory of the GNU gettextdistribution. Also, if option --force (-f) is given,the intl/ directory is emptied first.
  4. The file config.rpath is copied into the directory containingconfiguration support files. It is needed by the AM_GNU_GETTEXTautoconf macro.
  5. Only if the project is using GNU automake:A set of autoconf macro files is copied into the package’sautoconf macro repository, usually in a directory called m4/.

If your site support symbolic links, gettextize will notactually copy the files into your package, but establish symboliclinks instead. This avoids duplicating the disk space needed inall packages. Merely using the ‘-h’ option while creating thetar archive of your distribution will resolve each link by anactual copy in the distribution archive. So, to insist, you reallyshould use ‘-h’ option with tar within your distgoal of your main Makefile.in.

Furthermore, gettextize will update all Makefile.am filesin each affected directory, as well as the top level configure.acor configure.in file.

It is interesting to understand that most new files for supportingGNU gettext facilities in one package go in intl/,po/ and m4/ subdirectories. One distinction betweenintl/ and the two other directories is that intl/ ismeant to be completely identical in all packages using GNU gettext,while the other directories will mostly contain package dependentfiles.

The gettextize program makes backup files for all files itreplaces or changes, and also write ChangeLog entries about thesechanges. This way, the careful maintainer can check after runninggettextize whether its changes are acceptable to him, andpossibly adjust them. An exception to this rule is the intl/directory, which is added or replaced or removed as a whole.

It is important to understand that gettextize can not do theentire job of adapting a package for using GNU gettext. Theamount of remaining work depends on whether the package uses GNUautomake or not. But in any case, the maintainer should stillread the section Adjusting Files after invoking gettextize.

In particular, if after using ‘gettexize’, you get an error‘AC_COMPILE_IFELSE was called before AC_GNU_SOURCE’ or‘AC_RUN_IFELSE was called before AC_GNU_SOURCE’, you can fix itby modifying configure.ac, as described in configure.ac.

It is also important to understand that gettextize is not partof the GNU build system, in the sense that it should not be invokedautomatically, and not be invoked by someone who doesn’t assume theresponsibilities of a package maintainer. For the latter purpose, aseparate tool is provided, see autopoint Invocation.


13.4 Files You Must Create or Alter

Besides files which are automatically added through gettextize,there are many files needing revision for properly interacting withGNU gettext. If you are closely following GNU standards forMakefile engineering and auto-configuration, the adaptations shouldbe easier to achieve. Here is a point by point description of thechanges needed in each.

So, here comes a list of files, each one followed by a description ofall alterations it needs. Many examples are taken out from the GNUgettext 0.19.8 distribution itself, or from the GNUhello distribution (http://www.gnu.org/software/hello).You may indeed refer to the source code of the GNU gettext andGNU hello packages, as they are intended to be good examples forusing GNU gettext functionality.


13.4.1 POTFILES.in in po/

The po/ directory should receive a file namedPOTFILES.in. This file tells which files, among all programsources, have marked strings needing translation. Here is an exampleof such a file:

# List of source files containing translatable strings.
# Copyright (C) 1995 Free Software Foundation, Inc.

# Common library files
lib/error.c
lib/getopt.c
lib/xmalloc.c

# Package source files
src/gettext.c
src/msgfmt.c
src/xgettext.c

Hash-marked comments and white lines are ignored. All other lineslist those source files containing strings marked for translation(see Mark Keywords), in a notation relative to the top levelof your whole distribution, rather than the location of thePOTFILES.in file itself.

When a C file is automatically generated by a tool, like flex orbison, that doesn’t introduce translatable strings by itself,it is recommended to list in po/POTFILES.in the real source file(ending in .l in the case of flex, or in .y in thecase of bison), not the generated C file.


13.4.2 LINGUAS in po/

The po/ directory should also receive a file namedLINGUAS. This file contains the list of available translations.It is a whitespace separated list. Hash-marked comments and white linesare ignored. Here is an example file:

# Set of available languages.
de fr

This example means that German and French PO files are available, sothat these languages are currently supported by your package. If youwant to further restrict, at installation time, the set of installedlanguages, this should not be done by modifying the LINGUAS file,but rather by using the LINGUAS environment variable(see Installers).

It is recommended that you add the "languages" ‘en@quot’ and‘en@boldquot’ to the LINGUAS file. en@quot is avariant of English message catalogs (en) which uses real quotationmarks instead of the ugly looking asymmetric ASCII substitutes ‘`’and ‘'’. en@boldquot is a variant of en@quot thatadditionally outputs quoted pieces of text in a bold font, when used ina terminal emulator which supports the VT100 escape sequences (such asxterm or the Linux console, but not Emacs in M-x shell mode).

These extra message catalogs ‘en@quot’ and ‘en@boldquot’are constructed automatically, not by translators; to support them, youneed the files Rules-quot, quot.sed, boldquot.sed,[email protected], [email protected], insert-header.sinin the po/ directory. You can copy them from GNU gettext’s po/directory; they are also installed by running gettextize.


13.4.3 Makevars in po/

The po/ directory also has a file named Makevars. Itcontains variables that are specific to your project. po/Makevarsgets inserted into the po/Makefile when the latter is created.The variables thus take effect when the POT file is created or updated,and when the message catalogs get installed.

The first three variables can be left unmodified if your package has asingle message domain and, accordingly, a single po/ directory.Only packages which have multiple po/ directories at differentlocations need to adjust the three first variables defined inMakevars.

As an alternative to the XGETTEXT_OPTIONS variables, it is alsopossible to specify xgettext options through theAM_XGETTEXT_OPTION autoconf macro. See AM_XGETTEXT_OPTION.


13.4.4 Extending Makefile in po/

All files called Rules-* in the po/ directory get appended tothe po/Makefile when it is created. They present an opportunity toadd rules for special PO files to the Makefile, without needing to messwith po/Makefile.in.in.

GNU gettext comes with a Rules-quot file, containing rules forbuilding catalogs [email protected] and [email protected]. Theeffect of [email protected] is that people who set their LANGUAGEenvironment variable to ‘en@quot’ will get messages with properlooking symmetric Unicode quotation marks instead of abusing the ASCIIgrave accent and the ASCII apostrophe for indicating quotations. Toenable this catalog, simply add en@quot to the po/LINGUASfile. The effect of [email protected] is that people who setLANGUAGE to ‘en@boldquot’ will get not only proper quotationmarks, but also the quoted text will be shown in a bold font on terminalsand consoles. This catalog is useful only for command-line programs, notGUI programs. To enable it, similarly add en@boldquot to thepo/LINGUAS file.

Similarly, you can create rules for building message catalogs for thesr@latin locale – Serbian written with the Latin alphabet –from those for the sr locale – Serbian written with Cyrillicletters. See msgfilter Invocation.


13.4.5 configure.ac at top level

configure.ac or configure.in - this is the source from whichautoconf generates the configure script.

  1. Declare the package and version.

    This is done by a set of lines like these:

    PACKAGE=gettext
    VERSION=0.19.8
    AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
    AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
    AC_SUBST(PACKAGE)
    AC_SUBST(VERSION)
    

    or, if you are using GNU automake, by a line like this:

    AM_INIT_AUTOMAKE(gettext, 0.19.8)
    

    Of course, you replace ‘gettext’ with the name of your package,and ‘0.19.8’ by its version numbers, exactly as theyshould appear in the packaged tar file name of your distribution(gettext-0.19.8.tar.gz, here).

  2. Check for internationalization support.

    Here is the main m4 macro for triggering internationalizationsupport. Just add this line to configure.ac:

    AM_GNU_GETTEXT
    

    This call is purposely simple, even if it generates a lot of configuretime checking and actions.

    If you have suppressed the intl/ subdirectory by callinggettextize without ‘--intl’ option, this call should read

    AM_GNU_GETTEXT([external])
    
  3. Have output files created.

    The AC_OUTPUT directive, at the end of your configure.acfile, needs to be modified in two ways:

    AC_OUTPUT([existing configuration files intl/Makefile po/Makefile.in],
    [existing additional actions])
    

    The modification to the first argument to AC_OUTPUT asksfor substitution in the intl/ and po/ directories.Note the ‘.in’ suffix used for po/ only. This is becausethe distributed file is really po/Makefile.in.in.

    If you have suppressed the intl/ subdirectory by callinggettextize without ‘--intl’ option, then you don’t need toadd intl/Makefile to the AC_OUTPUT line.

If, after doing the recommended modifications, a command like‘aclocal -I m4’ or ‘autoconf’ or ‘autoreconf’ fails witha trace similar to this:

configure.ac:44: warning: AC_COMPILE_IFELSE was called before AC_GNU_SOURCE
../../lib/autoconf/specific.m4:335: AC_GNU_SOURCE is expanded from...
m4/lock.m4:224: gl_LOCK is expanded from...
m4/gettext.m4:571: gt_INTL_SUBDIR_CORE is expanded from...
m4/gettext.m4:472: AM_INTL_SUBDIR is expanded from...
m4/gettext.m4:347: AM_GNU_GETTEXT is expanded from...
configure.ac:44: the top level
configure.ac:44: warning: AC_RUN_IFELSE was called before AC_GNU_SOURCE

you need to add an explicit invocation of ‘AC_GNU_SOURCE’ in theconfigure.ac file - after ‘AC_PROG_CC’ but before‘AM_GNU_GETTEXT’, most likely very close to the ‘AC_PROG_CC’invocation. This is necessary because of ordering restrictions imposedby GNU autoconf.


13.4.6 config.guess, config.sub at top level

If you haven’t suppressed the intl/ subdirectory,you need to add the GNU config.guess and config.sub filesto your distribution. They are needed because the intl/ directoryhas platform dependent support for determining the locale’s characterencoding and therefore needs to identify the platform.

You can obtain the newest version of config.guess andconfig.sub from the ‘config’ project athttp://savannah.gnu.org/. The commands to fetch them are

$ wget -O config.guess 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD'
$ wget -O config.sub 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD'

Less recent versions are also contained in the GNU automake andGNU libtool packages.

Normally, config.guess and config.sub are put at thetop level of a distribution. But it is also possible to put them in asubdirectory, altogether with other configuration support files likeinstall-sh, ltconfig, ltmain.sh or missing.All you need to do, other than moving the files, is to add the following lineto your configure.ac.

AC_CONFIG_AUX_DIR([subdir])

13.4.7 mkinstalldirs at top level

With earlier versions of GNU gettext, you needed to add the GNUmkinstalldirs script to your distribution. This is not needed anymore. You can remove it if you not also using an automake version older thanautomake 1.9.


13.4.8 aclocal.m4 at top level

If you do not have an aclocal.m4 file in your distribution,the simplest is to concatenate the files codeset.m4, fcntl-o.m4,gettext.m4, glibc2.m4, glibc21.m4, iconv.m4,intdiv0.m4, intl.m4, intldir.m4, intlmacosx.m4,intmax.m4, inttypes_h.m4, inttypes-pri.m4,lcmessage.m4, lib-ld.m4, lib-link.m4,lib-prefix.m4, lock.m4, longlong.m4, nls.m4,po.m4, printf-posix.m4, progtest.m4, size_max.m4,stdint_h.m4, threadlib.m4, uintmax_t.m4,visibility.m4, wchar_t.m4, wint_t.m4, xsize.m4from GNU gettext’sm4/ directory into a single file. If you have suppressed theintl/ directory, only gettext.m4, iconv.m4,lib-ld.m4, lib-link.m4, lib-prefix.m4,nls.m4, po.m4, progtest.m4 need to be concatenated.

If you are not using GNU automake 1.8 or newer, you will need toadd a file mkdirp.m4 from a newer automake distribution to thelist of files above.

If you already have an aclocal.m4 file, then you will haveto merge the said macro files into your aclocal.m4. Note that ifyou are upgrading from a previous release of GNU gettext, youshould most probably replace the macros (AM_GNU_GETTEXT,etc.), as they usuallychange a little from one release of GNU gettext to the next.Their contents may vary as we get more experience with strange systemsout there.

If you are using GNU automake 1.5 or newer, it is enough to putthese macro files into a subdirectory named m4/ and add the line

ACLOCAL_AMFLAGS = -I m4

to your top level Makefile.am.

If you are using GNU automake 1.10 or newer, it is even easier:Add the line

ACLOCAL_AMFLAGS = --install -I m4

to your top level Makefile.am, and run ‘aclocal --install -I m4’.This will copy the needed files to the m4/ subdirectory automatically,before updating aclocal.m4.

These macros check for the internationalization support functionsand related informations. Hopefully, once stabilized, these macrosmight be integrated in the standard Autoconf set, because thispiece of m4 code will be the same for all projects using GNUgettext.


13.4.9 acconfig.h at top level

Earlier GNU gettext releases required to put definitions forENABLE_NLS, HAVE_GETTEXT and HAVE_LC_MESSAGES,HAVE_STPCPY, PACKAGE and VERSION into anacconfig.h file. This is not needed any more; you can removethem from your acconfig.h file unless your package uses themindependently from the intl/ directory.


13.4.10 config.h.in at top level

The include file template that holds the C macros to be defined byconfigure is usually called config.h.in and may bemaintained either manually or automatically.

If gettextize has created an intl/ directory, this filemust be called config.h.in and must be at the top level. If,however, you have suppressed the intl/ directory by callinggettextize without ‘--intl’ option, then you can choose thename of this file and its location freely.

If it is maintained automatically, by use of the ‘autoheader’program, you need to do nothing about it. This is the case in particularif you are using GNU automake.

If it is maintained manually, and if gettextize has created anintl/ directory, you should switch to using ‘autoheader’.The list of C macros to be added for the sake of the intl/directory is just too long to be maintained manually; it also changesbetween different versions of GNU gettext.

If it is maintained manually, and if on the other hand you havesuppressed the intl/ directory by calling gettextizewithout ‘--intl’ option, then you can get away by adding thefollowing lines to config.h.in:

/* Define to 1 if translation of program messages to the user's
   native language is requested. */
#undef ENABLE_NLS

13.4.11 Makefile.in at top level

Here are a few modifications you need to make to your main, top-levelMakefile.in file.

  1. Add the following lines near the beginning of your Makefile.in,so the ‘dist:’ goal will work properly (as explained further down):
    PACKAGE = @PACKAGE@
    VERSION = @VERSION@
    
  2. Add file ABOUT-NLS to the DISTFILES definition, so the file getsdistributed.
  3. Wherever you process subdirectories in your Makefile.in, be sureyou also process the subdirectories ‘intl’ and ‘po’. Specialrules in the Makefiles take care for the case where nointernationalization is wanted.

    If you are using Makefiles, either generated by automake, or hand-writtenso they carefully follow the GNU coding standards, the effected goals forwhich the new subdirectories must be handled include ‘installdirs’,‘install’, ‘uninstall’, ‘clean’, ‘distclean’.

    Here is an example of a canonical order of processing. In thisexample, we also define SUBDIRS in Makefile.in for itto be further used in the ‘dist:’ goal.

    SUBDIRS = doc intl lib src po
    

    Note that you must arrange for ‘make’ to descend into theintl directory before descending into other directories containingcode which make use of the libintl.h header file. For thisreason, here we mention intl before lib and src.

  4. A delicate point is the ‘dist:’ goal, as bothintl/Makefile and po/Makefile will later assume that theproper directory has been set up from the main Makefile. Here isan example at what the ‘dist:’ goal might look like:
    distdir = $(PACKAGE)-$(VERSION)
    dist: Makefile
    	rm -fr $(distdir)
    	mkdir $(distdir)
    	chmod 777 $(distdir)
    	for file in $(DISTFILES); do \
    	  ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
    	done
    	for subdir in $(SUBDIRS); do \
    	  mkdir $(distdir)/$$subdir || exit 1; \
    	  chmod 777 $(distdir)/$$subdir; \
    	  (cd $$subdir && $(MAKE) $@) || exit 1; \
    	done
    	tar chozf $(distdir).tar.gz $(distdir)
    	rm -fr $(distdir)
    

Note that if you are using GNU automake, Makefile.in isautomatically generated from Makefile.am, and all needed changesto Makefile.am are already made by running ‘gettextize’.


13.4.12 Makefile.in in src/

Some of the modifications made in the main Makefile.in willalso be needed in the Makefile.in from your package sources,which we assume here to be in the src/ subdirectory. Here areall the modifications needed in src/Makefile.in:

  1. In view of the ‘dist:’ goal, you should have these lines near thebeginning of src/Makefile.in:
    PACKAGE = @PACKAGE@
    VERSION = @VERSION@
    
  2. If not done already, you should guarantee that top_srcdirgets defined. This will serve for cpp include files. Just addthe line:
    top_srcdir = @top_srcdir@
    
  3. You might also want to define subdir as ‘src’, laterallowing for almost uniform ‘dist:’ goals in all yourMakefile.in. At list, the ‘dist:’ goal below assume thatyou used:
    subdir = src
    
  4. The main function of your program will normally callbindtextdomain (see see Triggering), like this:
    bindtextdomain (PACKAGE, LOCALEDIR);
    textdomain (PACKAGE);
    

    To make LOCALEDIR known to the program, add the following lines toMakefile.in if you are using Autoconf version 2.60 or newer:

    datadir = @datadir@
    datarootdir= @datarootdir@
    localedir = @localedir@
    DEFS = -DLOCALEDIR=\"$(localedir)\" @DEFS@
    

    or these lines if your version of Autoconf is older than 2.60:

    datadir = @datadir@
    localedir = $(datadir)/locale
    DEFS = -DLOCALEDIR=\"$(localedir)\" @DEFS@
    

    Note that @datadir@ defaults to ‘$(prefix)/share’, thus$(localedir) defaults to ‘$(prefix)/share/locale’.

  5. You should ensure that the final linking will use @LIBINTL@ or@LTLIBINTL@ as a library. @LIBINTL@ is for use withoutlibtool, @LTLIBINTL@ is for use with libtool. Aneasy way to achieve this is to manage that it gets into LIBS, likethis:
    LIBS = @LIBINTL@ @LIBS@
    

    In most packages internationalized with GNU gettext, one willfind a directory lib/ in which a library containing some helperfunctions will be build. (You need at least the few functions which theGNU gettext Library itself needs.) However some of the functionsin the lib/ also give messages to the user which of course should betranslated, too. Taking care of this, the support library (saylibsupport.a) should be placed before @LIBINTL@ and@LIBS@ in the above example. So one has to write this:

    LIBS = ../lib/libsupport.a @LIBINTL@ @LIBS@
    
  6. You should also ensure that directory intl/ will be searched forC preprocessor include files in all circumstances. So, you have tomanage so both ‘-I../intl’ and ‘-I$(top_srcdir)/intl’ willbe given to the C compiler.
  7. Your ‘dist:’ goal has to conform with others. Here is areasonable definition for it:
    distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
    dist: Makefile $(DISTFILES)
    	for file in $(DISTFILES); do \
    	  ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir) || exit 1; \
    	done
    

Note that if you are using GNU automake, Makefile.in isautomatically generated from Makefile.am, and the first threechanges and the last change are not necessary. The remaining neededMakefile.am modifications are the following:

  1. To make LOCALEDIR known to the program, add the following toMakefile.am:
    <module>_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
    

    for each specific module or compilation unit, or

    AM_CPPFLAGS = -DLOCALEDIR=\"$(localedir)\"
    

    for all modules and compilation units together. Furthermore, if you areusing an Autoconf version older then 2.60, add this line to define‘localedir’:

    localedir = $(datadir)/locale
    
  2. To ensure that the final linking will use @LIBINTL@ or@LTLIBINTL@ as a library, add the following toMakefile.am:
    <program>_LDADD = @LIBINTL@
    

    for each specific program, or

    LDADD = @LIBINTL@
    

    for all programs together. Remember that when you use libtoolto link a program, you need to use @LTLIBINTL@ instead of @LIBINTL@for that program.

  3. If you have an intl/ directory, whose contents is created bygettextize, then to ensure that it will be searched forC preprocessor include files in all circumstances, add something likethis to Makefile.am:
    AM_CPPFLAGS = -I../intl -I$(top_srcdir)/intl
    

13.4.13 gettext.h in lib/

Internationalization of packages, as provided by GNU gettext, isoptional. It can be turned off in two situations:

  • When the installer has specified ‘./configure --disable-nls’. Thiscan be useful when small binaries are more important than features, forexample when building utilities for boot diskettes. It can also be usefulin order to get some specific C compiler warnings about code quality withsome older versions of GCC (older than 3.0).
  • When the package does not include the intl/ subdirectory, and thelibintl.h header (with its associated libintl library, if any) is notalready installed on the system, it is preferable that the package buildswithout internationalization support, rather than to give a compilationerror.

A C preprocessor macro can be used to detect these two cases. Usually,when libintl.h was found and not explicitly disabled, theENABLE_NLS macro will be defined to 1 in the autoconf generatedconfiguration file (usually called config.h). In the two negativesituations, however, this macro will not be defined, thus it will evaluateto 0 in C preprocessor expressions.

gettext.h is a convenience header file for conditional use of<libintl.h>, depending on the ENABLE_NLS macro. IfENABLE_NLS is set, it includes <libintl.h>; otherwise itdefines no-op substitutes for the libintl.h functions. We recommendthe use of "gettext.h" over direct use of <libintl.h>,so that portability to older systems is guaranteed and installers canturn off internationalization if they want to. In the C code, you willthen write

#include "gettext.h"

instead of

#include <libintl.h>

The location of gettext.h is usually in a directory containingauxiliary include files. In many GNU packages, there is a directorylib/ containing helper functions; gettext.h fits there.In other packages, it can go into the src directory.

Do not install the gettext.h file in public locations. Everypackage that needs it should contain a copy of it on its own.


13.5 Autoconf macros for use in configure.ac

GNU gettext installs macros for use in a package’sconfigure.ac or configure.in.See Introduction in The Autoconf Manual.The primary macro is, of course, AM_GNU_GETTEXT.


13.5.1 AM_GNU_GETTEXT in gettext.m4

The AM_GNU_GETTEXT macro tests for the presence of the GNU gettextfunction family in either the C library or a separate libintllibrary (shared or static libraries are both supported) or in the package’sintl/ directory. It also invokes AM_PO_SUBDIRS, thus preparingthe po/ directories of the package for building.

AM_GNU_GETTEXT accepts up to three optional arguments. The generalsyntax is

AM_GNU_GETTEXT([intlsymbol], [needsymbol], [intldir])

intlsymbol can be ‘external’ or ‘no-libtool’. The default(if it is not specified or empty) is ‘no-libtool’. intlsymbolshould be ‘external’ for packages with no intl/ directory.For packages with an intl/ directory, you can either use anintlsymbol equal to ‘no-libtool’, or you can use ‘external’and override by using the macro AM_GNU_GETTEXT_INTL_SUBDIR elsewhere.The two ways to specify the existence of an intl/ directory areequivalent. At build time, a static library$(top_builddir)/intl/libintl.a will then be created.

If needsymbol is specified and is ‘need-ngettext’, then GNUgettext implementations (in libc or libintl) without the ngettext()function will be ignored. If needsymbol is specified and is‘need-formatstring-macros’, then GNU gettext implementations that don’tsupport the ISO C 99 <inttypes.h> formatstring macros will be ignored.Only one needsymbol can be specified. These requirements can also bespecified by using the macro AM_GNU_GETTEXT_NEED elsewhere. To specifymore than one requirement, just specify the strongest one among them, orinvoke the AM_GNU_GETTEXT_NEED macro several times. The hierarchyamong the various alternatives is as follows: ‘need-formatstring-macros’implies ‘need-ngettext’.

intldir is used to find the intl libraries. If empty, the value‘$(top_builddir)/intl/’ is used.

The AM_GNU_GETTEXT macro determines whether GNU gettext isavailable and should be used. If so, it sets the USE_NLS variableto ‘yes’; it defines ENABLE_NLS to 1 in the autoconfgenerated configuration file (usually called config.h); it setsthe variables LIBINTL and LTLIBINTL to the linker optionsfor use in a Makefile (LIBINTL for use without libtool,LTLIBINTL for use with libtool); it adds an ‘-I’ option toCPPFLAGS if necessary. In the negative case, it setsUSE_NLS to ‘no’; it sets LIBINTL and LTLIBINTLto empty and doesn’t change CPPFLAGS.

The complexities that AM_GNU_GETTEXT deals with are the following:

  • Some operating systems have gettext in the C library, for exampleglibc. Some have it in a separate library libintl. GNU libintlmight have been installed as part of the GNU gettext package.
  • GNU libintl, if installed, is not necessarily already in the searchpath (CPPFLAGS for the include file search path, LDFLAGS forthe library search path).
  • Except for glibc, the operating system’s native gettext cannotexploit the GNU mo files, doesn’t have the necessary locale dependencyfeatures, and cannot convert messages from the catalog’s text encodingto the user’s locale encoding.
  • GNU libintl, if installed, is not necessarily already in therun time library search path. To avoid the need for setting an environmentvariable like LD_LIBRARY_PATH, the macro adds the appropriaterun time search path options to the LIBINTL and LTLIBINTLvariables. This works on most systems, but not on some operating systemswith limited shared library support, like SCO.
  • GNU libintl relies on POSIX/XSI iconv. The macro checks forlinker options needed to use iconv and appends them to the LIBINTLand LTLIBINTL variables.

13.5.2 AM_GNU_GETTEXT_VERSION in gettext.m4

The AM_GNU_GETTEXT_VERSION macro declares the version number ofthe GNU gettext infrastructure that is used by the package.

The use of this macro is optional; only the autopoint program makesuse of it (see Version Control Issues).


13.5.3 AM_GNU_GETTEXT_NEED in gettext.m4

The AM_GNU_GETTEXT_NEED macro declares a constraint regarding theGNU gettext implementation. The syntax is

AM_GNU_GETTEXT_NEED([needsymbol])

If needsymbol is ‘need-ngettext’, then GNU gettext implementations(in libc or libintl) without the ngettext() function will be ignored.If needsymbol is ‘need-formatstring-macros’, then GNU gettextimplementations that don’t support the ISO C 99 <inttypes.h>formatstring macros will be ignored.

The optional second argument of AM_GNU_GETTEXT is also taken intoaccount.

The AM_GNU_GETTEXT_NEED invocations can occur before or afterthe AM_GNU_GETTEXT invocation; the order doesn’t matter.


13.5.4 AM_GNU_GETTEXT_INTL_SUBDIR in intldir.m4

The AM_GNU_GETTEXT_INTL_SUBDIR macro specifies that theAM_GNU_GETTEXT macro, although invoked with the first argument‘external’, should also prepare for building the intl/subdirectory.

The AM_GNU_GETTEXT_INTL_SUBDIR invocation can occur before or afterthe AM_GNU_GETTEXT invocation; the order doesn’t matter.

The use of this macro requires GNU automake 1.10 or newer andGNU autoconf 2.61 or newer.


13.5.5 AM_PO_SUBDIRS in po.m4

The AM_PO_SUBDIRS macro prepares the po/ directories of thepackage for building. This macro should be used in internationalizedprograms written in other programming languages than C, C++, Objective C,for example sh, Python, Lisp. See Programming Languages for a list of programming languages that support localizationthrough PO files.

The AM_PO_SUBDIRS macro determines whether internationalizationshould be used. If so, it sets the USE_NLS variable to ‘yes’,otherwise to ‘no’. It also determines the right values for Makefilevariables in each po/ directory.


13.5.6 AM_XGETTEXT_OPTION in po.m4

The AM_XGETTEXT_OPTION macro registers a command-line option to beused in the invocations of xgettext in the po/ directoriesof the package.

For example, if you have a source file that defines a function‘error_at_line’ whose fifth argument is a format string, you can use

AM_XGETTEXT_OPTION([--flag=error_at_line:5:c-format])

to instruct xgettext to mark all translatable strings in ‘gettext’invocations that occur as fifth argument to this function as ‘c-format’.

See xgettext Invocation for the list of options that xgettextaccepts.

The use of this macro is an alternative to the use of the‘XGETTEXT_OPTIONS’ variable in po/Makevars.


13.5.7 AM_ICONV in iconv.m4

The AM_ICONV macro tests for the presence of the POSIX/XSIiconv function family in either the C library or a separatelibiconv library. If found, it sets the am_cv_func_iconvvariable to ‘yes’; it defines HAVE_ICONV to 1 in the autoconfgenerated configuration file (usually called config.h); it definesICONV_CONST to ‘const’ or to empty, depending on whether thesecond argument of iconv() is of type ‘const char **’ or‘char **’; it sets the variables LIBICONV andLTLIBICONV to the linker options for use in a Makefile(LIBICONV for use without libtool, LTLIBICONV for use withlibtool); it adds an ‘-I’ option to CPPFLAGS ifnecessary. If not found, it sets LIBICONV and LTLIBICONV toempty and doesn’t change CPPFLAGS.

The complexities that AM_ICONV deals with are the following:

  • Some operating systems have iconv in the C library, for exampleglibc. Some have it in a separate library libiconv, for exampleOSF/1 or FreeBSD. Regardless of the operating system, GNU libiconvmight have been installed. In that case, it should be used instead of theoperating system’s native iconv.
  • GNU libiconv, if installed, is not necessarily already in the searchpath (CPPFLAGS for the include file search path, LDFLAGS forthe library search path).
  • GNU libiconv is binary incompatible with some operating system’snative iconv, for example on FreeBSD. Use of an iconv.hand libiconv.so that don’t fit together would produce programcrashes.
  • GNU libiconv, if installed, is not necessarily already in therun time library search path. To avoid the need for setting an environmentvariable like LD_LIBRARY_PATH, the macro adds the appropriaterun time search path options to the LIBICONV variable. This workson most systems, but not on some operating systems with limited sharedlibrary support, like SCO.

iconv.m4 is distributed with the GNU gettext package becausegettext.m4 relies on it.


13.6 Integrating with Version Control Systems

Many projects use version control systems for distributed developmentand source backup. This section gives some advice how to manage theuses of gettextize, autopoint and autoconf onversion controlled files.


13.6.1 Avoiding version mismatch in distributed development

In a project development with multiple developers, there should be asingle developer who occasionally - when there is desire to upgrade toa new gettext version - runs gettextize and performs thechanges listed in Adjusting Files, and then commits his changesto the repository.

It is highly recommended that all developers on a project use the sameversion of GNU gettext in the package. In other words, if adeveloper runs gettextize, he should go the whole way, make thenecessary remaining changes and commit his changes to the repository.Otherwise the following damages will likely occur:

  • Apparent version mismatch between developers. Since some gettextspecific portions in configure.ac, configure.in andMakefile.am, Makefile.in files depend on the gettextversion, the use of infrastructure files belonging to differentgettext versions can easily lead to build errors.
  • Hidden version mismatch. Such version mismatch can also lead tomalfunctioning of the package, that may be undiscovered by the developers.The worst case of hidden version mismatch is that internationalizationof the package doesn’t work at all.
  • Release risks. All developers implicitly perform constant testing ona package. This is important in the days and weeks before a release.If the guy who makes the release tar files uses a different versionof GNU gettext than the other developers, the distribution willbe less well tested than if all had been using the same gettextversion. For example, it is possible that a platform specific bug goesundiscovered due to this constellation.

13.6.2 Files to put under version control

There are basically three ways to deal with generated files in thecontext of a version controlled repository, such as configuregenerated from configure.ac, parser.c generatedfrom parser.y, or po/Makefile.in.in autoinstalledby gettextize or autopoint.

  1. All generated files are always committed into the repository.
  2. All generated files are committed into the repository occasionally,for example each time a release is made.
  3. Generated files are never committed into the repository.

Each of these three approaches has different advantages and drawbacks.

  1. The advantage is that anyone can check out the source at any moment andgets a working build. The drawbacks are: 1a. It requires some frequent"push" actions by the maintainers. 1b. The repository grows in sizequite fast.
  2. The advantage is that anyone can check out the source, and the usual"./configure; make" will work. The drawbacks are: 2a. The one whochecks out the repository needs tools like GNU automake, GNUautoconf, GNU m4 installed in his PATH; sometimes heeven needs particular versions of them. 2b. When a release is madeand a commit is made on the generated files, the other developers getconflicts on the generated files when merging the local work back tothe repository. Although these conflicts are easy to resolve, theyare annoying.
  3. The advantage is less work for the maintainers. The drawback is thatanyone who checks out the source not only needs tools like GNUautomake, GNU autoconf, GNU m4 installed in hisPATH, but also that he needs to perform a package specific pre-buildstep before being able to "./configure; make".

For the first and second approach, all files modified or brought inby the occasional gettextize invocation and update should becommitted into the repository.

For the third approach, the maintainer can omit from the repositoryall the files that gettextize mentions as "copy". Instead, headds to the configure.ac or configure.in a line of theform

AM_GNU_GETTEXT_VERSION(0.19.8)

and adds to the package’s pre-build script an invocation of‘autopoint’. For everyone who checks out the source, thisautopoint invocation will copy into the right place thegettext infrastructure files that have been omitted from the repository.

The version number used as argument to AM_GNU_GETTEXT_VERSION isthe version of the gettext infrastructure that the package wantsto use. It is also the minimum version number of the ‘autopoint’program. So, if you write AM_GNU_GETTEXT_VERSION(0.11.5) then thedevelopers can have any version >= 0.11.5 installed; the package will workwith the 0.11.5 infrastructure in all developers’ builds. When themaintainer then runs gettextize from, say, version 0.12.1 on the package,the occurrence of AM_GNU_GETTEXT_VERSION(0.11.5) will be changedinto AM_GNU_GETTEXT_VERSION(0.12.1), and all other developers thatuse the CVS will henceforth need to have GNU gettext 0.12.1 or newerinstalled.


13.6.3 Put PO Files under Version Control

Since translations are valuable assets as well as the source code, itwould make sense to put them under version control. The GNU gettextinfrastructure supports two ways to deal with translations in thecontext of a version controlled repository.

  1. Both POT file and PO files are committed into the repository.
  2. Only PO files are committed into the repository.

If a POT file is absent when building, it will be generated byscanning the source files with xgettext, and then the PO filesare regenerated as a dependency. On the other hand, some maintainerswant to keep the POT file unchanged during the development phase. So,even if a POT file is present and older than the source code, it won’tbe updated automatically. You can manually update it with make$(DOMAIN).pot-update, and commit it at certain point.

Special advices for particular version control systems:

  • Recent version control systems, Git for instance, ignore file’stimestamp. In that case, PO files can be accidentally updated even ifa POT file is not updated. To prevent this, you can set‘PO_DEPENDS_ON_POT’ variable to no in the Makevarsfile and do make update-po manually.
  • Location comments such as #: lib/error.c:116 are sometimesannoying, since these comments are volatile and may introduce unwantedchange to the working copy when building. To mitigate this, you candecide to omit those comments from the PO files in the repository.

    This is possible with the --no-location option of themsgmerge command 6. The drawback isthat, if the location information is needed, translators have torecover the location comments by running msgmerge again.


13.6.4 Invoking the autopoint Program
autopoint [option]...

The autopoint program copies standard gettext infrastructure filesinto a source package. It extracts from a macro call of the formAM_GNU_GETTEXT_VERSION(version), found in the package’sconfigure.in or configure.ac file, the gettext versionused by the package, and copies the infrastructure files belonging tothis version into the package.

To extract the latest available infrastructure which satisfies a versionrequirement, then you can use the formAM_GNU_GETTEXT_REQUIRE_VERSION(version) instead. Forexample, if gettext 0.19.8 is installed on your systemand 0.19.1 is requested, then the infrastructure files of version0.19.8 will be copied into a source package.

13.6.4.1 Options
-f’ ‘ --force

Force overwriting of files that already exist.

-n’ ‘ --dry-run

Print modifications but don’t perform them. All file copying actions thatautopoint would normally execute are inhibited and instead onlylisted on standard output.

13.6.4.2 Informative output
--help

Display this help and exit.

--version

Output version information and exit.

autopoint supports the GNU gettext versions from 0.10.35to the current one, 0.19.8. In order to applyautopoint to a package using a gettext version newer than0.19.8, you need to install this same version of GNUgettext at least.

In packages using GNU automake, an invocation of autopointshould be followed by invocations of aclocal and then autoconfand autoheader. The reason is that autopoint installs someautoconf macro files, which are used by aclocal to createaclocal.m4, and the latter is used by autoconf to create thepackage’s configure script and by autoheader to create thepackage’s config.h.in include file template.

The name ‘autopoint’ is an abbreviation of ‘auto-po-intl-m4’;the tool copies or updates mostly files in the po, intl,m4 directories.


13.7 Creating a Distribution Tarball

In projects that use GNU automake, the usual commands for creatinga distribution tarball, ‘make dist’ or ‘make distcheck’,automatically update the PO files as needed.

If GNU automake is not used, the maintainer needs to perform thisupdate before making a release:

$ ./configure
$ (cd po; make update-po)
$ make distclean

14 The Installer’s and Distributor’s View

By default, packages fully using GNU gettext, internally,are installed in such a way as to allow translation ofmessages. At configuration time, those packages shouldautomatically detect whether the underlying host system already providesthe GNU gettext functions. If not,the GNU gettext library should be automatically preparedand used. Installers may use special options at configurationtime for changing this behavior. The command ‘./configure--with-included-gettext’ bypasses system gettext touse the included GNU gettext instead,while ‘./configure --disable-nls’produces programs totally unable to translate messages.

Internationalized packages have usually many ll.pofiles. Unlesstranslations are disabled, all those available are installed togetherwith the package. However, the environment variable LINGUASmay be set, prior to configuration, to limit the installed set.LINGUAS should then contain a space separated list of two-lettercodes, stating which languages are allowed.


Next: Conclusion, Previous: Installers, Up: Top   [Contents][Index]

15 Other Programming Languages

While the presentation of gettext focuses mostly on C andimplicitly applies to C++ as well, its scope is far broader than that:Many programming languages, scripting languages and other textual datalike GUI resources or package descriptions can make use of the gettextapproach.


15.1 The Language Implementor’s View

All programming and scripting languages that have the notion of stringsare eligible to supporting gettext. Supporting gettextmeans the following:

  1. You should add to the language a syntax for translatable strings. Inprinciple, a function call of gettext would do, but a shorthandsyntax helps keeping the legibility of internationalized programs. Forexample, in C we use the syntax _("string"), and in GNU awk we usethe shorthand _"string".
  2. You should arrange that evaluation of such a translatable string atruntime calls the gettext function, or performs equivalentprocessing.
  3. Similarly, you should make the functions ngettext,dcgettext, dcngettext available from within the language.These functions are less often used, but are nevertheless necessary forparticular purposes: ngettext for correct plural handling, anddcgettext and dcngettext for obeying other locale-relatedenvironment variables than LC_MESSAGES, such as LC_TIME orLC_MONETARY. For these latter functions, you need to make theLC_* constants, available in the C header <locale.h>,referenceable from within the language, usually either as enumerationvalues or as strings.
  4. You should allow the programmer to designate a message domain, either bymaking the textdomain function available from within thelanguage, or by introducing a magic variable called TEXTDOMAIN.Similarly, you should allow the programmer to designate where to searchfor message catalogs, by providing access to the bindtextdomainfunction.
  5. You should either perform a setlocale (LC_ALL, "") call duringthe startup of your language runtime, or allow the programmer to do so.Remember that gettext will act as a no-op if the LC_MESSAGES andLC_CTYPE locale categories are not both set.
  6. A programmer should have a way to extract translatable strings from aprogram into a PO file. The GNU xgettext program is beingextended to support very different programming languages. Pleasecontact the GNU gettext maintainers to help them doing this. Ifthe string extractor is best integrated into your language’s parser, GNUxgettext can function as a front end to your string extractor.
  7. The language’s library should have a string formatting facility wherethe arguments of a format string are denoted by a positional number or aname. This is needed because for some languages and some messages withmore than one substitutable argument, the translation will need tooutput the substituted arguments in different order. See c-format Flag.
  8. If the language has more than one implementation, and not all of theimplementations use gettext, but the programs should be portableacross implementations, you should provide a no-i18n emulation, thatmakes the other implementations accept programs written for yours,without actually translating the strings.
  9. To help the programmer in the task of marking translatable strings,which is sometimes performed using the Emacs PO mode (see Marking),you are welcome tocontact the GNU gettext maintainers, so they can add support foryour language to po-mode.el.

On the implementation side, three approaches are possible, withdifferent effects on portability and copyright:

  • You may integrate the GNU gettext’s intl/ directory inyour package, as described in Maintainers. This allows you tohave internationalization on all kinds of platforms. Note that when youthen distribute your package, it legally falls under the GNU GeneralPublic License, and the GNU project will be glad about your contributionto the Free Software pool.
  • You may link against GNU gettext functions if they are found inthe C library. For example, an autoconf test for gettext() andngettext() will detect this situation. For the moment, this testwill succeed on GNU systems and not on other platforms. No severecopyright restrictions apply.
  • You may emulate or reimplement the GNU gettext functionality.This has the advantage of full portability and no copyrightrestrictions, but also the drawback that you have to reimplement the GNUgettext features (such as the LANGUAGE environmentvariable, the locale aliases database, the automatic charset conversion,and plural handling).

15.2 The Programmer’s View

For the programmer, the general procedure is the same as for the Clanguage. The Emacs PO mode marking supports other languages, and the GNUxgettext string extractor recognizes other languages based on thefile extension or a command-line option. In some languages,setlocale is not needed because it is already performed by theunderlying language runtime.


15.3 The Translator’s View

The translator works exactly as in the C language case. The onlydifference is that when translating format strings, she has to be awareof the language’s particular syntax for positional arguments in formatstrings.


15.3.1 C Format Strings

C format strings are described in POSIX (IEEE P1003.1 2001), sectionXSH 3 fprintf(),http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html.See also the fprintf() manual page,http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php,http://informatik.fh-wuerzburg.de/student/i510/man/printf.html.

Although format strings with positions that reorder arguments, such as

"Only %2$d bytes free on '%1$s'."

which is semantically equivalent to

"'%s' has only %d bytes free."

are a POSIX/XSI feature and not specified by ISO C 99, translators can relyon this reordering ability: On the few platforms where printf(),fprintf() etc. don’t support this feature natively, libintl.aor libintl.so provides replacement functions, and GNU <libintl.h>activates these replacement functions automatically.

As a special feature for Farsi (Persian) and maybe Arabic, translators caninsert an ‘I’ flag into numeric format directives. For example, thetranslation of "%d" can be "%Id". The effect of this flag,on systems with GNU libc, is that in the output, the ASCII digits arereplaced with the ‘outdigits’ defined in the LC_CTYPE localecategory. On other systems, the gettext function removes this flag,so that it has no effect.

Note that the programmer should not put this flag into theuntranslated string. (Putting the ‘I’ format directive flag into anmsgid string would lead to undefined behaviour on platforms withoutglibc when NLS is disabled.)


15.3.2 Objective C Format Strings

Objective C format strings are like C format strings. They support anadditional format directive: "%@", which when executed consumes an argumentof type Object *.


15.3.3 Shell Format Strings

Shell format strings, as supported by GNU gettext and the ‘envsubst’program, are strings with references to shell variables in the form$variable or ${ variable}. References of the form${ variable-default},${ variable:-default},${ variable=default},${ variable:=default},${ variable+replacement},${ variable:+replacement},${ variable?ignored},${ variable:?ignored},that would be valid inside shell scripts, are not supported. Thevariable names must consist solely of alphanumeric or underscoreASCII characters, not start with a digit and be nonempty; otherwise sucha variable reference is ignored.


15.3.4 Python Format Strings

There are two kinds of format strings in Python: those acceptable tothe Python built-in format operator %, labelled as‘python-format’, and those acceptable to the format methodof the ‘str’ object.

Python % format strings are described inPython Library reference /5. Built-in Types /5.6. Sequence Types /5.6.2. String Formatting Operations.http://docs.python.org/2/library/stdtypes.html#string-formatting-operations.

Python brace format strings are described in PEP 3101 – Advanced String Formatting, http://www.python.org/dev/peps/pep-3101/.


15.3.5 Lisp Format Strings

Lisp format strings are described in the Common Lisp HyperSpec,chapter 22.3 Formatted Output,http://www.lisp.org/HyperSpec/Body/sec_22-3.html.


15.3.6 Emacs Lisp Format Strings

Emacs Lisp format strings are documented in the Emacs Lisp reference,section Formatting Strings,http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75.Note that as of version 21, XEmacs supports numbered argument specificationsin format strings while FSF Emacs doesn’t.


15.3.7 librep Format Strings

librep format strings are documented in the librep manual, sectionFormatted Output,http://librep.sourceforge.net/librep-manual.html#Formatted%20Output,http://www.gwinnup.org/research/docs/librep.html#SEC122.


15.3.8 Scheme Format Strings

Scheme format strings are documented in the SLIB manual, sectionFormat Specification.


15.3.9 Smalltalk Format Strings

Smalltalk format strings are described in the GNU Smalltalk documentation,class CharArray, methods ‘bindWith:’ and‘bindWithArguments:’.http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238.In summary, a directive starts with ‘%’ and is followed by ‘%’or a nonzero digit (‘1’ to ‘9’).


15.3.10 Java Format Strings

Java format strings are described in the JDK documentation for classjava.text.MessageFormat,http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html.See also the ICU documentationhttp://oss.software.ibm.com/icu/apiref/classMessageFormat.html.


15.3.11 C# Format Strings

C# format strings are described in the .NET documentation for classSystem.String and inhttp://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpConFormattingOverview.asp.


15.3.12 awk Format Strings

awk format strings are described in the gawk documentation, sectionPrintf,http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf.


15.3.13 Object Pascal Format Strings

Object Pascal format strings are described in the documentation of theFree Pascal runtime library, section Format,http://www.freepascal.org/docs-html/rtl/sysutils/format.html.


15.3.14 YCP Format Strings

YCP sformat strings are described in the libycp documentationfile:/usr/share/doc/packages/libycp/YCP-builtins.html.In summary, a directive starts with ‘%’ and is followed by ‘%’or a nonzero digit (‘1’ to ‘9’).


15.3.15 Tcl Format Strings

Tcl format strings are described in the format.n manual page,http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm.


15.3.16 Perl Format Strings

There are two kinds format strings in Perl: those acceptable to thePerl built-in function printf, labelled as ‘perl-format’,and those acceptable to the libintl-perl function __x,labelled as ‘perl-brace-format’.

Perl printf format strings are described in the sprintfsection of ‘man perlfunc’.

Perl brace format strings are described in theLocale::TextDomain(3pm) manual page of the CPAN packagelibintl-perl. In brief, Perl format uses placeholders put betweenbraces (‘{ ’ and ‘}’). The placeholder must have the syntaxof simple identifiers.


15.3.17 PHP Format Strings

PHP format strings are described in the documentation of the PHP functionsprintf, in phpdoc/manual/function.sprintf.html orhttp://www.php.net/manual/en/function.sprintf.php.


15.3.18 GCC internal Format Strings

These format strings are used inside the GCC sources. In such a formatstring, a directive starts with ‘%’, is optionally followed by asize specifier ‘l’, an optional flag ‘+’, another optional flag‘#’, and is finished by a specifier: ‘%’ denotes a literalpercent sign, ‘c’ denotes a character, ‘s’ denotes a string,‘i’ and ‘d’ denote an integer, ‘o’, ‘u’, ‘x’denote an unsigned integer, ‘.*s’ denotes a string preceded by awidth specification, ‘H’ denotes a ‘location_t *’ pointer,‘D’ denotes a general declaration, ‘F’ denotes a functiondeclaration, ‘T’ denotes a type, ‘A’ denotes a function argument,‘C’ denotes a tree code, ‘E’ denotes an expression, ‘L’denotes a programming language, ‘O’ denotes a binary operator,‘P’ denotes a function parameter, ‘Q’ denotes an assignmentoperator, ‘V’ denotes a const/volatile qualifier.


15.3.19 GFC internal Format Strings

These format strings are used inside the GNU Fortran Compiler sources,that is, the Fortran frontend in the GCC sources. In such a formatstring, a directive starts with ‘%’ and is finished by aspecifier: ‘%’ denotes a literal percent sign, ‘C’ denotes thecurrent source location, ‘L’ denotes a source location, ‘c’denotes a character, ‘s’ denotes a string, ‘i’ and ‘d’denote an integer, ‘u’ denotes an unsigned integer. ‘i’,‘d’, and ‘u’ may be preceded by a size specifier ‘l’.


15.3.20 Qt Format Strings

Qt format strings are described in the documentation of the QString classfile:/usr/lib/qt-4.3.0/doc/html/qstring.html.In summary, a directive consists of a ‘%’ followed by a digit. The samedirective cannot occur more than once in a format string.


15.3.21 Qt Format Strings

Qt format strings are described in the documentation of the QObject::tr methodfile:/usr/lib/qt-4.3.0/doc/html/qobject.html.In summary, the only allowed directive is ‘%n’.


15.3.22 KDE Format Strings

KDE 4 format strings are defined as follows:A directive consists of a ‘%’ followed by a non-zero decimal number.If a ‘%n’ occurs in a format strings, all of ‘%1’, ..., ‘%(n-1)’must occur as well, except possibly one of them.


15.3.23 KUIT Format Strings

KUIT (KDE User Interface Text) is compatible with KDE 4 format strings,while it also allows programmers to add semantic information to a formatstring, through XML markup tags. For example, if the first formatdirective in a string is a filename, programmers could indicate thatwith a ‘filename’ tag, like ‘<filename>%1</filename>’.

KUIT format strings are described inhttp://api.kde.org/frameworks-api/frameworks5-apidocs/ki18n/html/prg_guide.html#kuit_markup.


15.3.24 Boost Format Strings

Boost format strings are described in the documentation of theboost::format class, athttp://www.boost.org/libs/format/doc/format.html.In summary, a directive has either the same syntax as in a C format string,such as ‘%1$+5d’, or may be surrounded by vertical bars, such as‘%|1$+5d|’ or ‘%|1$+5|’, or consists of just an argument numberbetween percent signs, such as ‘%1%’.


15.3.25 Lua Format Strings

Lua format strings are described in the Lua reference manual, section String Manipulation,http://www.lua.org/manual/5.1/manual.html#pdf-string.format.


15.3.26 JavaScript Format Strings

Although JavaScript specification itself does not define any formatstrings, many JavaScript implementations provide printf-likefunctions. xgettext understands a set of common format stringsused in popular JavaScript implementations including Gjs, Seed, andNode.JS. In such a format string, a directive starts with ‘%’and is finished by a specifier: ‘%’ denotes a literal percentsign, ‘c’ denotes a character, ‘s’ denotes a string,‘b’, ‘d’, ‘o’, ‘x’, ‘X’ denote an integer,‘f’ denotes floating-point number, ‘j’ denotes a JSONobject.


15.4 The Maintainer’s View

For the maintainer, the general procedure differs from the C languagecase in two ways.

  • For those languages that don’t use GNU gettext, the intl/ directoryis not needed and can be omitted. This means that the maintainer calls thegettextize program without the ‘--intl’ option, and that heinvokes the AM_GNU_GETTEXT autoconf macro via‘AM_GNU_GETTEXT([external])’.
  • If only a single programming language is used, the XGETTEXT_OPTIONSvariable in po/Makevars (see po/Makevars) should be adjusted tomatch the xgettext options for that particular programming language.If the package uses more than one programming language with gettextsupport, it becomes necessary to change the POT file construction rulein po/Makefile.in.in. It is recommended to make one xgettextinvocation per programming language, each with the options appropriate forthat language, and to combine the resulting files using msgcat.

15.5 Individual Programming Languages


15.5.1 C, C++, Objective C
RPMs

gcc, gpp, gobjc, glibc, gettext

File extension

For C: c, h.
For C++: C, c++, cc, cxx, cpp, hpp.
For Objective C: m.

String syntax

"abc"

gettext shorthand

_("abc")

gettext/ngettext functions

gettext, dgettext, dcgettext, ngettext,dngettext, dcngettext

textdomain

textdomain function

bindtextdomain

bindtextdomain function

setlocale

Programmer must call setlocale (LC_ALL, "")

Prerequisite

#include <libintl.h>
#include <locale.h>
#define _(string) gettext (string)

Use or emulate GNU gettext

Use

Extractor

xgettext -k_

Formatting with positions

fprintf "%2$d %1$d"
In C++: autosprintf "%2$d %1$d"(see Introduction in GNU autosprintf)

Portability

autoconf (gettext.m4) and #if ENABLE_NLS

po-mode marking

yes

The following examples are available in the examples directory:hello-c, hello-c-gnome, hello-c++, hello-c++-qt, hello-c++-kde, hello-c++-gnome, hello-c++-wxwidgets,hello-objc, hello-objc-gnustep, hello-objc-gnome.


15.5.2 sh - Shell Script
RPMs

bash, gettext

File extension

sh

String syntax

"abc", 'abc', abc

gettext shorthand

"`gettext \"abc\"`"

gettext/ngettext functions

gettext, ngettext programs
eval_gettext, eval_ngettext shell functions

textdomain

environment variable TEXTDOMAIN

bindtextdomain

environment variable TEXTDOMAINDIR

setlocale

automatic

Prerequisite

. gettext.sh

Use or emulate GNU gettext

use

Extractor

xgettext

Formatting with positions

Portability

fully portable

po-mode marking

An example is available in the examples directory: hello-sh.


Next: gettext.sh, Previous: sh, Up: sh   [Contents][Index]

15.5.2.1 Preparing Shell Scripts for Internationalization

Preparing a shell script for internationalization is conceptually similarto the steps described in Sources. The concrete steps for shellscripts are as follows.

  1. Insert the line
    . gettext.sh
    

    near the top of the script. gettext.sh is a shell function librarythat provides the functionseval_gettext (see eval_gettext Invocation) andeval_ngettext (see eval_ngettext Invocation).You have to ensure that gettext.sh can be found in the PATH.

  2. Set and export the TEXTDOMAIN and TEXTDOMAINDIR environmentvariables. Usually TEXTDOMAIN is the package or program name, andTEXTDOMAINDIR is the absolute pathname corresponding to$prefix/share/locale, where $prefix is the installation location.
    TEXTDOMAIN=@PACKAGE@
    export TEXTDOMAIN
    TEXTDOMAINDIR=@LOCALEDIR@
    export TEXTDOMAINDIR
    
  3. Prepare the strings for translation, as described in Preparing Strings.
  4. Simplify translatable strings so that they don’t contain command substitution("`...`" or "$(...)"), variable access with defaulting (like${ variable-default}), access to positional arguments(like $0, $1, ...) or highly volatile shell variables (like$?). This can always be done through simple local code restructuring.For example,
    echo "Usage: $0 [OPTION] FILE..."
    

    becomes

    program_name=$0
    echo "Usage: $program_name [OPTION] FILE..."
    

    Similarly,

    echo "Remaining files: `ls | wc -l`"
    

    becomes

    filecount="`ls | wc -l`"
    echo "Remaining files: $filecount"
    
  5. For each translatable string, change the output command ‘echo’ or‘$echo’ to ‘gettext’ (if the string contains no references toshell variables) or to ‘eval_gettext’ (if it refers to shell variables),followed by a no-argument ‘echo’ command (to account for the terminatingnewline). Similarly, for cases with plural handling, replace a conditional‘echo’ command with an invocation of ‘ngettext’ or‘eval_ngettext’, followed by a no-argument ‘echo’ command.

    When doing this, you also need to add an extra backslash before the dollarsign in references to shell variables, so that the ‘eval_gettext’function receives the translatable string before the variable values aresubstituted into it. For example,

    echo "Remaining files: $filecount"
    

    becomes

    eval_gettext "Remaining files: \$filecount"; echo
    

    If the output command is not ‘echo’, you can make it use ‘echo’nevertheless, through the use of backquotes. However, note that insidebackquotes, backslashes must be doubled to be effective (because thebackquoting eats one level of backslashes). For example, assuming that‘error’ is a shell function that signals an error,

    error "file not found: $filename"
    

    is first transformed into

    error "`echo \"file not found: \$filename\"`"
    

    which then becomes

    error "`eval_gettext \"file not found: \\\$filename\"`"
    

15.5.2.2 Contents of gettext.sh

gettext.sh, contained in the run-time package of GNU gettext, providesthe following:


15.5.2.3 Invoking the gettext program
gettext [option] [[textdomain] msgid]
gettext [option] -s [msgid]...

The gettext program displays the native language translation of atextual message.

Arguments

-d textdomain’ ‘ --domain=textdomain

Retrieve translated messages from textdomain. Usually a textdomaincorresponds to a package, a program, or a module of a program.

-e

Enable expansion of some escape sequences. This option is for compatibilitywith the ‘echo’ program or shell built-in. The escape sequences‘\a’, ‘\b’, ‘\c’, ‘\f’, ‘\n’, ‘\r’, ‘\t’,‘\v’, ‘\\’, and ‘\’ followed by one to three octal digits, areinterpreted like the System V ‘echo’ program did.

-E

This option is only for compatibility with the ‘echo’ program or shellbuilt-in. It has no effect.

-h’ ‘ --help

Display this help and exit.

-n

Suppress trailing newline. By default, gettext adds a newline tothe output.

-V’ ‘ --version

Output version information and exit.

[textdomain] msgid

Retrieve translated message corresponding to msgid from textdomain.

If the textdomain parameter is not given, the domain is determined fromthe environment variable TEXTDOMAIN. If the message catalog is notfound in the regular directory, another location can be specified with theenvironment variable TEXTDOMAINDIR.

When used with the -s option the program behaves like the ‘echo’command. But it does not simply copy its arguments to stdout. Instead thosemessages found in the selected catalog are translated.

Note: xgettext supports only the one-argument form of thegettext invocation, where no options are present and thetextdomain is implicit, from the environment.


15.5.2.4 Invoking the ngettext program
ngettext [option] [textdomain] msgid msgid-plural count

The ngettext program displays the native language translation of atextual message whose grammatical form depends on a number.

Arguments

-d textdomain’ ‘ --domain=textdomain

Retrieve translated messages from textdomain. Usually a textdomaincorresponds to a package, a program, or a module of a program.

-e

Enable expansion of some escape sequences. This option is for compatibilitywith the ‘gettext’ program. The escape sequences‘\a’, ‘\b’, ‘\c’, ‘\f’, ‘\n’, ‘\r’, ‘\t’,‘\v’, ‘\\’, and ‘\’ followed by one to three octal digits, areinterpreted like the System V ‘echo’ program did.

-E

This option is only for compatibility with the ‘gettext’ program. It hasno effect.

-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.

textdomain

Retrieve translated message from textdomain.

msgid msgid-plural

Translate msgid (English singular) / msgid-plural (English plural).

count

Choose singular/plural form based on this value.

If the textdomain parameter is not given, the domain is determined fromthe environment variable TEXTDOMAIN. If the message catalog is notfound in the regular directory, another location can be specified with theenvironment variable TEXTDOMAINDIR.

Note: xgettext supports only the three-arguments form of thengettext invocation, where no options are present and thetextdomain is implicit, from the environment.


15.5.2.5 Invoking the envsubst program
envsubst [option] [shell-format]

The envsubst program substitutes the values of environment variables.

Operation mode

-v’ ‘ --variables

Output the variables occurring in shell-format.

Informative output

-h’ ‘ --help

Display this help and exit.

-V’ ‘ --version

Output version information and exit.

In normal operation mode, standard input is copied to standard output,with references to environment variables of the form $VARIABLE or${VARIABLE} being replaced with the corresponding values. If ashell-format is given, only those environment variables that arereferenced in shell-format are substituted; otherwise all environmentvariables references occurring in standard input are substituted.

These substitutions are a subset of the substitutions that a shell performson unquoted and double-quoted strings. Other kinds of substitutions doneby a shell, such as ${ variable-default} or$(command-list) or `command-list`, are not performedby the envsubst program, due to security reasons.

When --variables is used, standard input is ignored, and the outputconsists of the environment variables that are referenced inshell-format, one per line.


15.5.2.6 Invoking the eval_gettext function
eval_gettext msgid

This function outputs the native language translation of a textual message,performing dollar-substitution on the result. Note that only shell variablesmentioned in msgid will be dollar-substituted in the result.


15.5.2.7 Invoking the eval_ngettext function
eval_ngettext msgid msgid-plural count

This function outputs the native language translation of a textual messagewhose grammatical form depends on a number, performing dollar-substitutionon the result. Note that only shell variables mentioned in msgid ormsgid-plural will be dollar-substituted in the result.


15.5.3 bash - Bourne-Again Shell Script

GNU bash 2.0 or newer has a special shorthand for translating astring and substituting variable values in it: $"msgid". Butthe use of this construct is discouraged, due to the securityholes it opens and due to its portability problems.

The security holes of $"..." come from the fact that after looking upthe translation of the string, bash processes it like it processesany double-quoted string: dollar and backquote processing, like ‘eval’does.

  1. In a locale whose encoding is one of BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS,JOHAB, some double-byte characters have a second byte whose value is0x60. For example, the byte sequence \xe0\x60 is a singlecharacter in these locales. Many versions of bash (all versionsup to bash-2.05, and newer versions on platforms without mbsrtowcs()function) don’t know about character boundaries and see a backquote characterwhere there is only a particular Chinese character. Thus it can startexecuting part of the translation as a command list. This situation can occureven without the translator being aware of it: if the translator providestranslations in the UTF-8 encoding, it is the gettext() function whichwill, during its conversion from the translator’s encoding to the user’slocale’s encoding, produce the dangerous \x60 bytes.
  2. A translator could - voluntarily or inadvertently - use backquotes"`...`" or dollar-parentheses "$(...)" in her translations.The enclosed strings would be executed as command lists by the shell.

The portability problem is that bash must be built withinternationalization support; this is normally not the case on systemsthat don’t have the gettext() function in libc.


15.5.4 Python
RPMs

python

File extension

py

String syntax

'abc', u'abc', r'abc', ur'abc',
"abc", u"abc", r"abc", ur"abc",
'''abc''', u'''abc''', r'''abc''', ur'''abc''',
"""abc""", u"""abc""", r"""abc""", ur"""abc"""

gettext shorthand

_('abc') etc.

gettext/ngettext functions

gettext.gettext, gettext.dgettext,gettext.ngettext, gettext.dngettext,also ugettext, ungettext

textdomain

gettext.textdomain function, orgettext.install(domain) function

bindtextdomain

gettext.bindtextdomain function, orgettext.install(domain,localedir) function

setlocale

not used by the gettext emulation

Prerequisite

import gettext

Use or emulate GNU gettext

emulate

Extractor

xgettext

Formatting with positions

'...%(ident)d...' % { 'ident': value }

Portability

fully portable

po-mode marking

An example is available in the examples directory: hello-python.

A note about format strings: Python supports format strings with unnamedarguments, such as '...%d...', and format strings with named arguments,such as '...%(ident)d...'. The latter are preferable forinternationalized programs, for two reasons:

  • When a format string takes more than one argument, the translator can providea translation that uses the arguments in a different order, if the formatstring uses named arguments. For example, the translator can reformulate
    "'%(volume)s' has only %(freespace)d bytes free."
    

    to

    "Only %(freespace)d bytes free on '%(volume)s'."
    

    Additionally, the identifiers also provide some context to the translator.

  • In the context of plural forms, the format string used for the singular formdoes not use the numeric argument in many languages. Even in English, oneprefers to write "one hour" instead of "1 hour". Omittingindividual arguments from format strings like this is only possible withthe named argument syntax. (With unnamed arguments, Python – unlike C –verifies that the format string uses all supplied arguments.)

15.5.5 GNU clisp - Common Lisp
RPMs

clisp 2.28 or newer

File extension

lisp

String syntax

"abc"

gettext shorthand

(_ "abc"), (ENGLISH "abc")

gettext/ngettext functions

i18n:gettext, i18n:ngettext

textdomain

i18n:textdomain

bindtextdomain

i18n:textdomaindir

setlocale

automatic

Prerequisite

Use or emulate GNU gettext

use

Extractor

xgettext -k_ -kENGLISH

Formatting with positions

format "~1@*~D ~0@*~D"

Portability

On platforms without gettext, no translation.

po-mode marking

An example is available in the examples directory: hello-clisp.


15.5.6 GNU clisp C sources
RPMs

clisp

File extension

d

String syntax

"abc"

gettext shorthand

ENGLISH ? "abc" : ""
GETTEXT("abc")
GETTEXTL("abc")

gettext/ngettext functions

clgettext, clgettextl

textdomain

bindtextdomain

setlocale

automatic

Prerequisite

#include "lispbibl.c"

Use or emulate GNU gettext

use

Extractor

clisp-xgettext

Formatting with positions

fprintf "%2$d %1$d"

Portability

On platforms without gettext, no translation.

po-mode marking


15.5.7 Emacs Lisp
RPMs

emacs, xemacs

File extension

el

String syntax

"abc"

gettext shorthand

(_"abc")

gettext/ngettext functions

gettext, dgettext (xemacs only)

textdomain

domain special form (xemacs only)

bindtextdomain

bind-text-domain function (xemacs only)

setlocale

automatic

Prerequisite

Use or emulate GNU gettext

use

Extractor

xgettext

Formatting with positions

format "%2$d %1$d"

Portability

Only XEmacs. Without I18N3 defined at build time, no translation.

po-mode marking


15.5.8 librep
RPMs

librep 0.15.3 or newer

File extension

jl

String syntax

"abc"

gettext shorthand

(_"abc")

gettext/ngettext functions

gettext

textdomain

textdomain function

bindtextdomain

bindtextdomain function

setlocale

Prerequisite

(require 'rep.i18n.gettext)

Use or emulate GNU gettext

use

Extractor

xgettext

Formatting with positions

format "%2$d %1$d"

Portability

On platforms without gettext, no translation.

po-mode marking

An example is available in the examples directory: hello-librep.


15.5.9 GNU guile - Scheme
RPMs

guile

File extension

scm

String syntax

"abc"

gettext shorthand

(_ "abc"), _"abc" (GIMP script-fu extension)

gettext/ngettext functions

gettext, ngettext

textdomain

textdomain

bindtextdomain

bindtextdomain

setlocale

(catch #t (lambda () (setlocale LC_ALL "")) (lambda args #f))

Prerequisite

(use-modules (ice-9 format))

Use or emulate GNU gettext

use

Extractor

xgettext -k_

Formatting with positions

Portability

On platforms without gettext, no translation.

po-mode marking

An example is available in the examples directory: hello-guile.


15.5.10 GNU Smalltalk
RPMs

smalltalk

File extension

st

String syntax

'abc'

gettext shorthand

NLS ? 'abc'

gettext/ngettext functions

LcMessagesDomain>>#at:, LcMessagesDomain>>#at:plural:with:

textdomain

LcMessages>>#domain:localeDirectory: (returns a LcMessagesDomainobject).
Example: I18N Locale default messages domain: 'gettext' localeDirectory: /usr/local/share/locale'

bindtextdomain

LcMessages>>#domain:localeDirectory:, see above.

setlocale

Automatic if you use I18N Locale default.

Prerequisite

PackageLoader fileInPackage: 'I18N'!

Use or emulate GNU gettext

emulate

Extractor

xgettext

Formatting with positions

'%1 %2' bindWith: 'Hello' with: 'world'

Portability

fully portable

po-mode marking

An example is available in the examples directory:hello-smalltalk.


15.5.11 Java
RPMs

java, java2

File extension

java

String syntax

"abc"

gettext shorthand

_("abc")

gettext/ngettext functions

GettextResource.gettext, GettextResource.ngettext,GettextResource.pgettext, GettextResource.npgettext

textdomain

—, use ResourceBundle.getResource instead

bindtextdomain

—, use CLASSPATH instead

setlocale

automatic

Prerequisite

Use or emulate GNU gettext

—, uses a Java specific message catalog format

Extractor

xgettext -k_

Formatting with positions

MessageFormat.format "{1,number} {0,number}"

Portability

fully portable

po-mode marking

Before marking strings as internationalizable, uses of the stringconcatenation operator need to be converted to MessageFormatapplications. For example, "file "+filename+" not found" becomesMessageFormat.format("file {0} not found", new Object[] { filename }).Only after this is done, can the strings be marked and extracted.

GNU gettext uses the native Java internationalization mechanism, namelyResourceBundles. There are two formats of ResourceBundles:.properties files and .class files. The .propertiesformat is a text file which the translators can directly edit, like POfiles, but which doesn’t support plural forms. Whereas the .classformat is compiled from .java source code and can support pluralforms (provided it is accessed through an appropriate API, see below).

To convert a PO file to a .properties file, the msgcatprogram can be used with the option --properties-output. To converta .properties file back to a PO file, the msgcat programcan be used with the option --properties-input. All the toolsthat manipulate PO files can work with .properties files as well,if given the --properties-input and/or --properties-outputoption.

To convert a PO file to a ResourceBundle class, the msgfmt programcan be used with the option --java or --java2. To convert aResourceBundle back to a PO file, the msgunfmt program can be usedwith the option --java.

Two different programmatic APIs can be used to access ResourceBundles.Note that both APIs work with all kinds of ResourceBundles, whetherGNU gettext generated classes, or other .class or .propertiesfiles.

  1. The java.util.ResourceBundle API.

    In particular, its getString function returns a string translation.Note that a missing translation yields a MissingResourceException.

    This has the advantage of being the standard API. And it does not requireany additional libraries, only the msgcat generated .propertiesfiles or the msgfmt generated .class files. But it cannot doplural handling, even if the resource was generated by msgfmt froma PO file with plural handling.

  2. The gnu.gettext.GettextResource API.

    Reference documentation in Javadoc 1.1 style format is in thejavadoc2 directory.

    Its gettext function returns a string translation. Note that whena translation is missing, the msgid argument is returned unchanged.

    This has the advantage of having the ngettext function for pluralhandling and the pgettext and npgettext for strings constraintto a particular context.

    To use this API, one needs the libintl.jar file which is part ofthe GNU gettext package and distributed under the LGPL.

Four examples, using the second API, are available in the examplesdirectory: hello-java, hello-java-awt, hello-java-swing,hello-java-qtjambi.

Now, to make use of the API and define a shorthand for ‘getString’,there are three idioms that you can choose from:

  • (This one assumes Java 1.5 or newer.)In a unique class of your project, say ‘Util’, define a static variableholding the ResourceBundle instance and the shorthand:
    private static ResourceBundle myResources =
      ResourceBundle.getBundle("domain-name");
    public static String _(String s) {
      return myResources.getString(s);
    }
    

    All classes containing internationalized strings then contain

    import static Util._;
    

    and the shorthand is used like this:

    System.out.println(_("Operation completed."));
    
  • In a unique class of your project, say ‘Util’, define a static variableholding the ResourceBundle instance:
    public static ResourceBundle myResources =
      ResourceBundle.getBundle("domain-name");
    

    All classes containing internationalized strings then contain

    private static ResourceBundle res = Util.myResources;
    private static String _(String s) { return res.getString(s); }
    

    and the shorthand is used like this:

    System.out.println(_("Operation completed."));
    
  • You add a class with a very short name, say ‘S’, containing just thedefinition of the resource bundle and of the shorthand:
    public class S {
      public static ResourceBundle myResources =
        ResourceBundle.getBundle("domain-name");
      public static String _(String s) {
        return myResources.getString(s);
      }
    }
    

    and the shorthand is used like this:

    System.out.println(S._("Operation completed."));
    

Which of the three idioms you choose, will depend on whether your projectrequires portability to Java versions prior to Java 1.5 and, if so, whethercopying two lines of codes into every class is more acceptable in your projectthan a class with a single-letter name.


15.5.12 C#
RPMs

pnet, pnetlib 0.6.2 or newer, or mono 0.29 or newer

File extension

cs

String syntax

"abc", @"abc"

gettext shorthand

_("abc")

gettext/ngettext functions

GettextResourceManager.GetString,GettextResourceManager.GetPluralStringGettextResourceManager.GetParticularStringGettextResourceManager.GetParticularPluralString

textdomain

new GettextResourceManager(domain)

bindtextdomain

—, compiled message catalogs are located in subdirectories of the directorycontaining the executable

setlocale

automatic

Prerequisite

Use or emulate GNU gettext

—, uses a C# specific message catalog format

Extractor

xgettext -k_

Formatting with positions

String.Format "{1} {0}"

Portability

fully portable

po-mode marking

Before marking strings as internationalizable, uses of the stringconcatenation operator need to be converted to String.Formatinvocations. For example, "file "+filename+" not found" becomesString.Format("file {0} not found", filename).Only after this is done, can the strings be marked and extracted.

GNU gettext uses the native C#/.NET internationalization mechanism, namelythe classes ResourceManager and ResourceSet. Applicationsuse the ResourceManager methods to retrieve the native languagetranslation of strings. An instance of ResourceSet is the in-memoryrepresentation of a message catalog file. The ResourceManager loadsand accesses ResourceSet instances as needed to look up thetranslations.

There are two formats of ResourceSets that can be directly loaded bythe C# runtime: .resources files and .dll files.

  • The .resources format is a binary file usually generated through theresgen or monoresgen utility, but which doesn’t support pluralforms. .resources files can also be embedded in .NET .exe files.This only affects whether a file system access is performed to load the messagecatalog; it doesn’t affect the contents of the message catalog.
  • On the other hand, the .dll format is a binary file that is compiledfrom .cs source code and can support plural forms (provided it isaccessed through the GNU gettext API, see below).

Note that these .NET .dll and .exe files are not tied to aparticular platform; their file format and GNU gettext for C# can be usedon any platform.

To convert a PO file to a .resources file, the msgfmt programcan be used with the option ‘--csharp-resources’. To convert a.resources file back to a PO file, the msgunfmt program can beused with the option ‘--csharp-resources’. You can also, in some cases,use the resgen program (from the pnet package) or themonoresgen program (from the mono/mcs package). Theseprograms can also convert a .resources file back to a PO file. Butbeware: as of this writing (January 2004), the monoresgen converter isquite buggy and the resgen converter ignores the encoding of the POfiles.

To convert a PO file to a .dll file, the msgfmt program can beused with the option --csharp. The result will be a .dll filecontaining a subclass of GettextResourceSet, which itself is a subclassof ResourceSet. To convert a .dll file containing aGettextResourceSet subclass back to a PO file, the msgunfmtprogram can be used with the option --csharp.

The advantages of the .dll format over the .resources formatare:

  1. Freedom to localize: Users can add their own translations to an applicationafter it has been built and distributed. Whereas when the programmer usesa ResourceManager constructor provided by the system, the set of.resources files for an application must be specified when theapplication is built and cannot be extended afterwards.
  2. Plural handling: A message catalog in .dll format supports the pluralhandling function GetPluralString. Whereas .resources files canonly contain data and only support lookups that depend on a single string.
  3. Context handling: A message catalog in .dll format supports thequery-with-context functions GetParticularString andGetParticularPluralString. Whereas .resources files canonly contain data and only support lookups that depend on a single string.
  4. The GettextResourceManager that loads the message catalogs in.dll format also provides for inheritance on a per-message basis.For example, in Austrian (de_AT) locale, translations from the German(de) message catalog will be used for messages not found in theAustrian message catalog. This has the consequence that the Austriantranslators need only translate those few messages for which the translationinto Austrian differs from the German one. Whereas when working with.resources files, each message catalog must provide the translationsof all messages by itself.
  5. The GettextResourceManager that loads the message catalogs in.dll format also provides for a fallback: The English msgid isreturned when no translation can be found. Whereas when working with.resources files, a language-neutral .resources file mustexplicitly be provided as a fallback.

On the side of the programmatic APIs, the programmer can use either thestandard ResourceManager API and the GNU GettextResourceManagerAPI. The latter is an extension of the former, becauseGettextResourceManager is a subclass of ResourceManager.

  1. The System.Resources.ResourceManager API.

    This API works with resources in .resources format.

    The creation of the ResourceManager is done through

      new ResourceManager(domainname, Assembly.GetExecutingAssembly())
    

    The GetString function returns a string’s translation. Note that thisfunction returns null when a translation is missing (i.e. not even found inthe fallback resource file).

  2. The GNU.Gettext.GettextResourceManager API.

    This API works with resources in .dll format.

    Reference documentation is in thecsharpdoc directory.

    The creation of the ResourceManager is done through

      new GettextResourceManager(domainname)
    

    The GetString function returns a string’s translation. Note that whena translation is missing, the msgid argument is returned unchanged.

    The GetPluralString function returns a string translation with pluralhandling, like the ngettext function in C.

    The GetParticularString function returns a string’s translation,specific to a particular context, like the pgettext function in C.Note that when a translation is missing, the msgid argument is returnedunchanged.

    The GetParticularPluralString function returns a string translation,specific to a particular context, with plural handling, like thenpgettext function in C.

    To use this API, one needs the GNU.Gettext.dll file which is part ofthe GNU gettext package and distributed under the LGPL.

You can also mix both approaches: use theGNU.Gettext.GettextResourceManager constructor, but otherwise useonly the ResourceManager type and only the GetString method.This is appropriate when you want to profit from the tools for PO files,but don’t want to change an existing source code that usesResourceManager and don’t (yet) need the GetPluralString method.

Two examples, using the second API, are available in the examplesdirectory: hello-csharp, hello-csharp-forms.

Now, to make use of the API and define a shorthand for ‘GetString’,there are two idioms that you can choose from:

  • In a unique class of your project, say ‘Util’, define a static variableholding the ResourceManager instance:
    public static GettextResourceManager MyResourceManager =
      new GettextResourceManager("domain-name");
    

    All classes containing internationalized strings then contain

    private static GettextResourceManager Res = Util.MyResourceManager;
    private static String _(String s) { return Res.GetString(s); }
    

    and the shorthand is used like this:

    Console.WriteLine(_("Operation completed."));
    
  • You add a class with a very short name, say ‘S’, containing just thedefinition of the resource manager and of the shorthand:
    public class S {
      public static GettextResourceManager MyResourceManager =
        new GettextResourceManager("domain-name");
      public static String _(String s) {
         return MyResourceManager.GetString(s);
      }
    }
    

    and the shorthand is used like this:

    Console.WriteLine(S._("Operation completed."));
    

Which of the two idioms you choose, will depend on whether copying two linesof codes into every class is more acceptable in your project than a classwith a single-letter name.


15.5.13 GNU awk
RPMs

gawk 3.1 or newer

File extension

awk, gawk, twjr.The file extension twjr is used by TexiWeb Jr(https://github.com/arnoldrobbins/texiwebjr).

String syntax

"abc"

gettext shorthand

_"abc"

gettext/ngettext functions

dcgettext, missing dcngettext in gawk-3.1.0

textdomain

TEXTDOMAIN variable

bindtextdomain

bindtextdomain function

setlocale

automatic, but missing setlocale (LC_MESSAGES, "") in gawk-3.1.0

Prerequisite

Use or emulate GNU gettext

use

Extractor

xgettext

Formatting with positions

printf "%2$d %1$d" (GNU awk only)

Portability

On platforms without gettext, no translation. On non-GNU awks, you mustdefine dcgettext, dcngettext and bindtextdomainyourself.

po-mode marking

An example is available in the examples directory: hello-gawk.


15.5.14 Pascal - Free Pascal Compiler
RPMs

fpk

File extension

pp, pas

String syntax

'abc'

gettext shorthand

automatic

gettext/ngettext functions

—, use ResourceString data type instead

textdomain

—, use TranslateResourceStrings function instead

bindtextdomain

—, use TranslateResourceStrings function instead

setlocale

automatic, but uses only LANG, not LC_MESSAGES or LC_ALL

Prerequisite

{$mode delphi} or {$mode objfpc}
uses gettext;

Use or emulate GNU gettext

emulate partially

Extractor

ppc386 followed by xgettext or rstconv

Formatting with positions

uses sysutils;
format "%1:d %0:d"

Portability

?

po-mode marking

The Pascal compiler has special support for the ResourceString datatype. It generates a .rst file. This is then converted to a.pot file by use of xgettext or rstconv. At runtime,a .mo file corresponding to translations of this .pot filecan be loaded using the TranslateResourceStrings function in thegettext unit.

An example is available in the examples directory: hello-pascal.


15.5.15 wxWidgets library
RPMs

wxGTK, gettext

File extension

cpp

String syntax

"abc"

gettext shorthand

_("abc")

gettext/ngettext functions

wxLocale::GetString, wxGetTranslation

textdomain

wxLocale::AddCatalog

bindtextdomain

wxLocale::AddCatalogLookupPathPrefix

setlocale

wxLocale::Init, wxSetLocale

Prerequisite

#include <wx/intl.h>

Use or emulate GNU gettext

emulate, see include/wx/intl.h and src/common/intl.cpp

Extractor

xgettext

Formatting with positions

wxString::Format supports positions if and only if the system haswprintf(), vswprintf() functions and they support positionsaccording to POSIX.

Portability

fully portable

po-mode marking

yes


15.5.16 YCP - YaST2 scripting language
RPMs

libycp, libycp-devel, yast2-core, yast2-core-devel

File extension

ycp

String syntax

"abc"

gettext shorthand

_("abc")

gettext/ngettext functions

_() with 1 or 3 arguments

textdomain

textdomain statement

bindtextdomain

setlocale

Prerequisite

Use or emulate GNU gettext

use

Extractor

xgettext

Formatting with positions

sformat "%2 %1"

Portability

fully portable

po-mode marking

An example is available in the examples directory: hello-ycp.


15.5.17 Tcl - Tk’s scripting language
RPMs

tcl

File extension

tcl

String syntax

"abc"

gettext shorthand

[_ "abc"]

gettext/ngettext functions

::msgcat::mc

textdomain

bindtextdomain

—, use ::msgcat::mcload instead

setlocale

automatic, uses LANG, but ignores LC_MESSAGES and LC_ALL

Prerequisite

package require msgcat
proc _ {s} {return [::msgcat::mc $s]}

Use or emulate GNU gettext

—, uses a Tcl specific message catalog format

Extractor

xgettext -k_

Formatting with positions

format "%2\$d %1\$d"

Portability

fully portable

po-mode marking

Two examples are available in the examples directory:hello-tcl, hello-tcl-tk.

Before marking strings as internationalizable, substitutions of variablesinto the string need to be converted to format applications. Forexample, "file $filename not found" becomes[format "file %s not found" $filename].Only after this is done, can the strings be marked and extracted.After marking, this example becomes[format [_ "file %s not found"] $filename] or[msgcat::mc "file %s not found" $filename]. Note that themsgcat::mc function implicitly calls format when more than oneargument is given.


15.5.18 Perl
RPMs

perl

File extension

pl, PL, pm, perl, cgi

String syntax
  • "abc"
  • 'abc'
  • qq (abc)
  • q (abc)
  • qr /abc/
  • qx (/bin/date)
  • /pattern match/
  • ?pattern match?
  • s/substitution/operators/
  • $tied_hash{"message"}
  • $tied_hash_reference->{"message"}
  • etc., issue the command ‘man perlsyn’ for details
gettext shorthand

__ (double underscore)

gettext/ngettext functions

gettext, dgettext, dcgettext, ngettext,dngettext, dcngettext

textdomain

textdomain function

bindtextdomain

bindtextdomain function

bind_textdomain_codeset

bind_textdomain_codeset function

setlocale

Use setlocale (LC_ALL, "");

Prerequisite

use POSIX;
use Locale::TextDomain; (included in the package libintl-perlwhich is available on the Comprehensive Perl Archive Network CPAN,http://www.cpan.org/).

Use or emulate GNU gettext

platform dependent: gettext_pp emulates, gettext_xs uses GNU gettext

Extractor

xgettext -k__ -k\$__ -k%__ -k__x -k__n:1,2 -k__nx:1,2 -k__xn:1,2 -kN__ -k

Formatting with positions

Both kinds of format strings support formatting with positions.
printf "%2\$d %1\$d", ... (requires Perl 5.8.0 or newer)
__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)

Portability

The libintl-perl package is platform independent but is notpart of the Perl core. The programmer is responsible forproviding a dummy implementation of the required functions if the package is not installed on the target system.

po-mode marking

Documentation

Included in libintl-perl, available on CPAN(http://www.cpan.org/).

An example is available in the examples directory: hello-perl.

The xgettext parser backend for Perl differs significantly fromthe parser backends for other programming languages, just as Perlitself differs significantly from other programming languages. ThePerl parser backend offers many more string marking facilities thanthe other backends but it also has some Perl specific limitations, theworst probably being its imperfectness.


Next: Default Keywords, Previous: Perl, Up: Perl   [Contents][Index]

15.5.18.1 General Problems Parsing Perl Code

It is often heard that only Perl can parse Perl. This is not true.Perl cannot be parsed at all, it can only be executed.Perl has various built-in ambiguities that can only be resolved at runtime.

The following example may illustrate one common problem:

print gettext "Hello World!";

Although this example looks like a bullet-proof case of a functioninvocation, it is not:

open gettext, ">testfile" or die;
print gettext "Hello world!"

In this context, the string gettext looks more like afile handle. But not necessarily:

use Locale::Messages qw (:libintl_h);
open gettext ">testfile" or die;
print gettext "Hello world!";

Now, the file is probably syntactically incorrect, provided that the moduleLocale::Messages found first in the Perl include path exports afunction gettext. But what if the moduleLocale::Messages really looks like this?

use vars qw (*gettext);

1;

In this case, the string gettext will be interpreted as a filehandle again, and the above example will create a file testfileand write the string “Hello world!” into it. Even advancedcontrol flow analysis will not really help:

if (0.5 < rand) {
   eval "use Sane";
} else {
   eval "use InSane";
}
print gettext "Hello world!";

If the module Sane exports a function gettext that doeswhat we expect, and the module InSane opens a file for writingand associates the handle gettext with this outputstream, we are clueless again about what will happen at runtime. It iscompletely unpredictable. The truth is that Perl has so many ways tofill its symbol table at runtime that it is impossible to interpret aparticular piece of code without executing it.

Of course, xgettext will not execute your Perl sources whilescanning for translatable strings, but rather use heuristics in orderto guess what you meant.

Another problem is the ambiguity of the slash and the question mark.Their interpretation depends on the context:

# A pattern match.
print "OK\n" if /foobar/;

# A division.
print 1 / 2;

# Another pattern match.
print "OK\n" if ?foobar?;

# Conditional.
print $x ? "foo" : "bar";

The slash may either act as the division operator or introduce apattern match, whereas the question mark may act as the ternaryconditional operator or as a pattern match, too. Other programminglanguages like awk present similar problems, but the consequences of amisinterpretation are particularly nasty with Perl sources. In awkfor instance, a statement can never exceed one line and the parsercan recover from a parsing error at the next newline and interpretthe rest of the input stream correctly. Perl is different, as apattern match is terminated by the next appearance of the delimiter(the slash or the question mark) in the input stream, regardless ofthe semantic context. If a slash is really a division sign butmis-interpreted as a pattern match, the rest of the input file is mostprobably parsed incorrectly.

There are certain cases, where the ambiguity cannot be resolved at all:

$x = wantarray ? 1 : 0;

The Perl built-in function wantarray does not accept any arguments.The Perl parser therefore knows that the question mark does not starta regular expression but is the ternary conditional operator.

sub wantarrays {}
$x = wantarrays ? 1 : 0;

Now the situation is different. The function wantarrays takesa variable number of arguments (like any non-prototyped Perl function).The question mark is now the delimiter of a pattern match, and hencethe piece of code does not compile.

sub wantarrays() {}
$x = wantarrays ? 1 : 0;

Now the function is prototyped, Perl knows that it does not accept anyarguments, and the question mark is therefore interpreted as theternaray operator again. But that unfortunately outsmarts xgettext.

The Perl parser in xgettext cannot know whether a function hasa prototype and what that prototype would look like. It therefore makesan educated guess. If a function is known to be a Perl built-in andthis function does not accept any arguments, a following question markor slash is treated as an operator, otherwise as the delimiter of afollowing regular expression. The Perl built-ins that do not acceptarguments are wantarray, fork, time, times,getlogin, getppid, getpwent, getgrent,gethostent, getnetent, getprotoent, getservent,setpwent, setgrent, endpwent, endgrent,endhostent, endnetent, endprotoent, andendservent.

If you find that xgettext fails to extract strings fromportions of your sources, you should therefore look out for slashesand/or question marks preceding these sections. You may have comeacross a bug in xgettext’s Perl parser (and of course youshould report that bug). In the meantime you should consider toreformulate your code in a manner less challenging to xgettext.

In particular, if the parser is too dumb to see that a functiondoes not accept arguments, use parentheses:

$x = somefunc() ? 1 : 0;
$y = (somefunc) ? 1 : 0;

In fact the Perl parser itself has similar problems and warns youabout such constructs.


15.5.18.2 Which keywords will xgettext look for?

Unless you instruct xgettext otherwise by invoking it with oneof the options --keyword or -k, it will recognize thefollowing keywords in your Perl sources:

  • gettext
  • dgettext
  • dcgettext
  • ngettext:1,2

    The first (singular) and the second (plural) argument will beextracted.

  • dngettext:1,2

    The first (singular) and the second (plural) argument will beextracted.

  • dcngettext:1,2

    The first (singular) and the second (plural) argument will beextracted.

  • gettext_noop
  • %gettext

    The keys of lookups into the hash %gettext will be extracted.

  • $gettext

    The keys of lookups into the hash reference $gettext will be extracted.


15.5.18.3 How to Extract Hash Keys

Translating messages at runtime is normally performed by looking up theoriginal string in the translation database and returning thetranslated version. The “natural” Perl implementation is a hashlookup, and, of course, xgettext supports such practice.

print __"Hello world!";
print $__{"Hello world!"};
print $__->{"Hello world!"};
print $$__{"Hello world!"};

The above four lines all do the same thing. The Perl module Locale::TextDomain exports by default a hash %__ thatis tied to the function __(). It also exports a reference$__ to %__.

If an argument to the xgettext option --keyword,resp. -k starts with a percent sign, the rest of the keyword isinterpreted as the name of a hash. If it starts with a dollarsign, the rest of the keyword is interpreted as a reference to ahash.

Note that you can omit the quotation marks (single or double) aroundthe hash key (almost) whenever Perl itself allows it:

print $gettext{Error};

The exact rule is: You can omit the surrounding quotes, when the hashkey is a valid C (!) identifier, i.e. when it starts with anunderscore or an ASCII letter and is followed by an arbitrary numberof underscores, ASCII letters or digits. Other Unicode charactersare not allowed, regardless of the use utf8 pragma.


15.5.18.4 What are Strings And Quote-like Expressions?

Perl offers a plethora of different string constructs. Those that canbe used either as arguments to functions or inside braces for hashlookups are generally supported by xgettext.

  • double-quoted strings
    print gettext "Hello World!";
    
  • single-quoted strings
    print gettext 'Hello World!';
    
  • the operator qq
    print gettext qq |Hello World!|;
    print gettext qq <E-mail: <guido\@imperia.net>>;
    

    The operator qq is fully supported. You can use arbitrarydelimiters, including the four bracketing delimiters (round, angle,square, curly) that nest.

  • the operator q
    print gettext q |Hello World!|;
    print gettext q <E-mail: <[email protected]>>;
    

    The operator q is fully supported. You can use arbitrarydelimiters, including the four bracketing delimiters (round, angle,square, curly) that nest.

  • the operator qx
    print gettext qx ;LANGUAGE=C /bin/date;
    print gettext qx [/usr/bin/ls | grep '^[A-Z]*'];
    

    The operator qx is fully supported. You can use arbitrarydelimiters, including the four bracketing delimiters (round, angle,square, curly) that nest.

    The example is actually a useless use of gettext. It willinvoke the gettext function on the output of the commandspecified with the qx operator. The feature was includedin order to make the interface consistent (the parser will extractall strings and quote-like expressions).

  • here documents
    print gettext <<'EOF';
    program not found in $PATH
    EOF
    
    print ngettext <<EOF, <<"EOF";
    one file deleted
    EOF
    several files deleted
    EOF
    

    Here-documents are recognized. If the delimiter is enclosed in singlequotes, the string is not interpolated. If it is enclosed in doublequotes or has no quotes at all, the string is interpolated.

    Delimiters that start with a digit are not supported!


15.5.18.5 Invalid Uses Of String Interpolation

Perl is capable of interpolating variables into strings. This offerssome nice features in localized programs but can also lead toproblems.

A common error is a construct like the following:

print gettext "This is the program $0!\n";

Perl will interpolate at runtime the value of the variable $0into the argument of the gettext() function. Hence, thisargument is not a string constant but a variable argument ($0is a global variable that holds the name of the Perl script beingexecuted). The interpolation is performed by Perl before the stringargument is passed to gettext() and will therefore depend onthe name of the script which can only be determined at runtime.Consequently, it is almost impossible that a translation can be lookedup at runtime (except if, by accident, the interpolated string is foundin the message catalog).

The xgettext program will therefore terminate parsing with a fatalerror if it encounters a variable inside of an extracted string. Ingeneral, this will happen for all kinds of string interpolations thatcannot be safely performed at compile time. If you absolutely knowwhat you are doing, you can always circumvent this behavior:

my $know_what_i_am_doing = "This is program $0!\n";
print gettext $know_what_i_am_doing;

Since the parser only recognizes strings and quote-like expressions,but not variables or other terms, the above construct will beaccepted. You will have to find another way, however, to let youroriginal string make it into your message catalog.

If invoked with the option --extract-all, resp. -a,variable interpolation will be accepted. Rationale: You willgenerally use this option in order to prepare your sources forinternationalization.

Please see the manual page ‘man perlop’ for details of strings andquote-like expressions that are subject to interpolation and thosethat are not. Safe interpolations (that will not lead to a fatalerror) are:

  • the escape sequences \t (tab, HT, TAB), \n(newline, NL), \r (return, CR), \f (form feed, FF),\b (backspace, BS), \a (alarm, bell, BEL), and \e(escape, ESC).
  • octal chars, like \033
    Note that octal escapes in the range of 400-777 are translated into a UTF-8 representation, regardless of the presence of the use utf8 pragma.
  • hex chars, like \x1b
  • wide hex chars, like \x{263a}
    Note that this escape is translated into a UTF-8 representation,regardless of the presence of the use utf8 pragma.
  • control chars, like \c[ (CTRL-[)
  • named Unicode chars, like \N{LATIN CAPITAL LETTER C WITH CEDILLA}
    Note that this escape is translated into a UTF-8 representation,regardless of the presence of the use utf8 pragma.

The following escapes are considered partially safe:

  • \l lowercase next char
  • \u uppercase next char
  • \L lowercase till \E
  • \U uppercase till \E
  • \E end case modification
  • \Q quote non-word characters till \E

These escapes are only considered safe if the string consists ofASCII characters only. Translation of characters outside the rangedefined by ASCII is locale-dependent and can actually only be performed at runtime; xgettext doesn’t do these locale-dependent translationsat extraction time.

Except for the modifier \Q, these translations, albeit valid,are generally useless and only obfuscate your sources. If atranslation can be safely performed at compile time you can just aswell write what you mean.


Next: Parentheses, Previous: Interpolation I, Up: Perl   [Contents][Index]

15.5.18.6 Valid Uses Of String Interpolation

Perl is often used to generate sources for other programming languagesor arbitrary file formats. Web applications that output HTML codemake a prominent example for such usage.

You will often come across situations where you want to interspersecode written in the target (programming) language with translatablemessages, like in the following HTML example:

print gettext <<EOF;
<h1>My Homepage</h1>
<script language="JavaScript"><!--
for (i = 0; i < 100; ++i) {
    alert ("Thank you so much for visiting my homepage!");
}
//--></script>
EOF

The parser will extract the entire here document, and it will appearentirely in the resulting PO file, including the JavaScript snippetembedded in the HTML code. If you exaggerate with constructs like the above, you will run the risk that the translators of your package will look out for a less challenging project. You should consider an alternative expression here:

print <<EOF;
<h1>$gettext{"My Homepage"}</h1>
<script language="JavaScript"><!--
for (i = 0; i < 100; ++i) {
    alert ("$gettext{'Thank you so much for visiting my homepage!'}");
}
//--></script>
EOF

Only the translatable portions of the code will be extracted here, andthe resulting PO file will begrudgingly improve in terms of readability.

You can interpolate hash lookups in all strings or quote-likeexpressions that are subject to interpolation (see the manual page‘man perlop’ for details). Double interpolation is invalid, however:

# TRANSLATORS: Replace "the earth" with the name of your planet.
print gettext qq{Welcome to $gettext->{"the earth"}};

The qq-quoted string is recognized as an argument to xgettext inthe first place, and checked for invalid variable interpolation. Thedollar sign of hash-dereferencing will therefore terminate the parser with an “invalid interpolation” error.

It is valid to interpolate hash lookups in regular expressions:

if ($var =~ /$gettext{"the earth"}/) {
   print gettext "Match!\n";
}
s/$gettext{"U. S. A."}/$gettext{"U. S. A."} $gettext{"(dial +0)"}/g;

Next: Long Lines, Previous: Interpolation II, Up: Perl   [Contents][Index]

15.5.18.7 When To Use Parentheses

In Perl, parentheses around function arguments are mostly optional.xgettext will always assume that allrecognized keywords (except for hashes and hash references) are namesof properly prototyped functions, and will (hopefully) only requireparentheses where Perl itself requires them. All constructs in thefollowing example are therefore ok to use:

print gettext ("Hello World!\n");
print gettext "Hello World!\n";
print dgettext ($package => "Hello World!\n");
print dgettext $package, "Hello World!\n";

# The "fat comma" => turns the left-hand side argument into a
# single-quoted string!
print dgettext smellovision => "Hello World!\n";

# The following assignment only works with prototyped functions.
# Otherwise, the functions will act as "greedy" list operators and
# eat up all following arguments.
my $anonymous_hash = {
   planet => gettext "earth",
   cakes => ngettext "one cake", "several cakes", $n,
   still => $works,
};
# The same without fat comma:
my $other_hash = {
   'planet', gettext "earth",
   'cakes', ngettext "one cake", "several cakes", $n,
   'still', $works,
};

# Parentheses are only significant for the first argument.
print dngettext 'package', ("one cake", "several cakes", $n), $discarded;

Next: Perl Pitfalls, Previous: Parentheses, Up: Perl   [Contents][Index]

15.5.18.8 How To Grok with Long Lines

The necessity of long messages can often lead to a cumbersome orunreadable coding style. Perl has several options that may preventyou from writing unreadable code, andxgettext does its best to do likewise. This is where the dotoperator (the string concatenation operator) may come in handy:

print gettext ("This is a very long"
               . " message that is still"
               . " readable, because"
               . " it is split into"
               . " multiple lines.\n");

Perl is smart enough to concatenate these constant string fragmentsinto one long string at compile time, and so isxgettext. You will only find one long message in the resultingPOT file.

Note that the future Perl 6 will probably use the underscore(‘_’) as the string concatenation operator, and the dot (‘.’) for dereferencing. This new syntax is not yet supported byxgettext.

If embedded newline characters are not an issue, or even desired, youmay also insert newline characters inside quoted strings wherever youfeel like it:

print gettext ("<em>In HTML output
embedded newlines are generally no
problem, since adjacent whitespace
is always rendered into a single
space character.</em>");

You may also consider to use here documents:

print gettext <<EOF;
<em>In HTML output
embedded newlines are generally no
problem, since adjacent whitespace
is always rendered into a single
space character.</em>
EOF

Please do not forget that the line breaks are real, i.e. theytranslate into newline characters that will consequently show up inthe resulting POT file.


Previous: Long Lines, Up: Perl   [Contents][Index]

15.5.18.9 Bugs, Pitfalls, And Things That Do Not Work

The foregoing sections should have proven thatxgettext is quite smart in extracting translatable strings fromPerl sources. Yet, some more or less exotic constructs that could beexpected to work, actually do not work.

One of the more relevant limitations can be found in theimplementation of variable interpolation inside quoted strings. Onlysimple hash lookups can be used there:

print <<EOF;
$gettext{"The dot operator"
          . " does not work"
          . "here!"}
Likewise, you cannot @{[ gettext ("interpolate function calls") ]}
inside quoted strings or quote-like expressions.
EOF

This is valid Perl code and will actually trigger invocations of thegettext function at runtime. Yet, the Perl parser inxgettext will fail to recognize the strings. A less obviousexample can be found in the interpolation of regular expressions:

s/<!--START_OF_WEEK-->/gettext ("Sunday")/e;

The modifier e will cause the substitution to be interpreted asan evaluable statement. Consequently, at runtime the functiongettext() is called, but again, the parser fails to extract thestring “Sunday”. Use a temporary variable as a simple workaround ifyou really happen to need this feature:

my $sunday = gettext "Sunday";
s/<!--START_OF_WEEK-->/$sunday/;

Hash slices would also be handy but are not recognized:

my @weekdays = @gettext{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
                        'Thursday', 'Friday', 'Saturday'};
# Or even:
@weekdays = @gettext{qw (Sunday Monday Tuesday Wednesday Thursday
                         Friday Saturday) };

This is perfectly valid usage of the tied hash %gettext but thestrings are not recognized and therefore will not be extracted.

Another caveat of the current version is its rudimentary support fornon-ASCII characters in identifiers. You may encounter seriousproblems if you use identifiers with characters outside the range of’A’-’Z’, ’a’-’z’, ’0’-’9’ and the underscore ’_’.

Maybe some of these missing features will be implemented in futureversions, but since you can always make do without them at minimal effort,these todos have very low priority.

A nasty problem are brace format strings that already contain bracesas part of the normal text, for example the usage strings typicallyencountered in programs:

die "usage: $0 {OPTIONS} FILENAME...\n";

If you want to internationalize this code with Perl brace format strings,you will run into a problem:

die __x ("usage: {program} {OPTIONS} FILENAME...\n", program => $0);

Whereas ‘{program}’ is a placeholder, ‘{OPTIONS}’is not and should probably be translated. Yet, there is no way to teachthe Perl parser in xgettext to recognize the first one, and leavethe other one alone.

There are two possible work-arounds for this problem. If you aresure that your program will run under Perl 5.8.0 or newer (thesePerl versions handle positional parameters in printf()) orif you are sure that the translator will not have to reorder the argumentsin her translation – for example if you have only one brace placeholderin your string, or if it describes a syntax, like in this one –, you canmark the string as no-perl-brace-format and use printf():

# xgettext: no-perl-brace-format
die sprintf ("usage: %s {OPTIONS} FILENAME...\n", $0);

If you want to use the more portable Perl brace format, you will have to doput placeholders in place of the literal braces:

die __x ("usage: {program} {[}OPTIONS{]} FILENAME...\n",
         program => $0, '[' => '{', ']' => '}');

Perl brace format strings know no escaping mechanism. No matter how thisescaping mechanism looked like, it would either give the programmer ahard time, make translating Perl brace format strings heavy-going, orresult in a performance penalty at runtime, when the format directivesget executed. Most of the time you will happily get along withprintf() for this special case.


15.5.19 PHP Hypertext Preprocessor
RPMs

mod_php4, mod_php4-core, phpdoc

File extension

php, php3, php4

String syntax

"abc", 'abc'

gettext shorthand

_("abc")

gettext/ngettext functions

gettext, dgettext, dcgettext; starting with PHP 4.2.0also ngettext, dngettext, dcngettext

textdomain

textdomain function

bindtextdomain

bindtextdomain function

setlocale

Programmer must call setlocale (LC_ALL, "")

Prerequisite

Use or emulate GNU gettext

use

Extractor

xgettext

Formatting with positions

printf "%2\$d %1\$d"

Portability

On platforms without gettext, the functions are not available.

po-mode marking

An example is available in the examples directory: hello-php.


15.5.20 Pike
RPMs

roxen

File extension

pike

String syntax

"abc"

gettext shorthand

gettext/ngettext functions

gettext, dgettext, dcgettext

textdomain

textdomain function

bindtextdomain

bindtextdomain function

setlocale

setlocale function

Prerequisite

import Locale.Gettext;

Use or emulate GNU gettext

use

Extractor

Formatting with positions

Portability

On platforms without gettext, the functions are not available.

po-mode marking


15.5.21 GNU Compiler Collection sources
RPMs

gcc

File extension

c, h.

String syntax

"abc"

gettext shorthand

_("abc")

gettext/ngettext functions

gettext, dgettext, dcgettext, ngettext,dngettext, dcngettext

textdomain

textdomain function

bindtextdomain

bindtextdomain function

setlocale

Programmer must call setlocale (LC_ALL, "")

Prerequisite

#include "intl.h"

Use or emulate GNU gettext

Use

Extractor

xgettext -k_

Formatting with positions

Portability

Uses autoconf macros

po-mode marking

yes


15.5.22 Lua
RPMs

lua

File extension

lua

String syntax
  • "abc"
  • 'abc'
  • [[abc]]
  • [=[abc]=]
  • [==[abc]==]
  • ...
gettext shorthand

_("abc")

gettext/ngettext functions

gettext.gettext, gettext.dgettext, gettext.dcgettext,gettext.ngettext, gettext.dngettext, gettext.dcngettext

textdomain

textdomain function

bindtextdomain

bindtextdomain function

setlocale

automatic

Prerequisite

require 'gettext' or running lua interpreter with -l gettext option

Use or emulate GNU gettext

use

Extractor

xgettext

Formatting with positions

Portability

On platforms without gettext, the functions are not available.

po-mode marking


15.5.23 JavaScript
RPMs

js

File extension

js

String syntax
  • "abc"
  • 'abc'
gettext shorthand

_("abc")

gettext/ngettext functions

gettext, dgettext, dcgettext, ngettext,dngettext

textdomain

textdomain function

bindtextdomain

bindtextdomain function

setlocale

automatic

Prerequisite

Use or emulate GNU gettext

use, or emulate

Extractor

xgettext

Formatting with positions

Portability

On platforms without gettext, the functions are not available.

po-mode marking


15.5.24 Vala
RPMs

vala

File extension

vala

String syntax
  • "abc"
  • """abc"""
gettext shorthand

_("abc")

gettext/ngettext functions

gettext, dgettext, dcgettext, ngettext,dngettext, dpgettext, dpgettext2

textdomain

textdomain function, defined under the Intl namespace

bindtextdomain

bindtextdomain function, defined under the Intl namespace

setlocale

Programmer must call Intl.setlocale (LocaleCategory.ALL, "")

Prerequisite

Use or emulate GNU gettext

Use

Extractor

xgettext

Formatting with positions

Same as for the C language.

Portability

autoconf (gettext.m4) and #if ENABLE_NLS

po-mode marking

yes


15.6 Internationalizable Data

Here is a list of other data formats which can be internationalizedusing GNU gettext.


15.6.1 POT - Portable Object Template
RPMs

gettext

File extension

pot, po

Extractor

xgettext


Next: Glade, Previous: POT, Up: List of Data Formats   [Contents][Index]

15.6.2 Resource String Table
RPMs

fpk

File extension

rst

Extractor

xgettext, rstconv


15.6.3 Glade - GNOME user interface description
RPMs

glade, libglade, glade2, libglade2, intltool

File extension

glade, glade2, ui

Extractor

xgettext, libglade-xgettext, xml-i18n-extract, intltool-extract


15.6.4 GSettings - GNOME user configuration schema
RPMs

glib2

File extension

gschema.xml

Extractor

xgettext, intltool-extract


15.6.5 AppData - freedesktop.org application description
RPMs

appdata-tools, appstream, libappstream-glib, libappstream-glib-builder

File extension

appdata.xml

Extractor

xgettext, intltool-extract, itstool


15.6.6 Preparing Rules for XML Internationalization

Marking translatable strings in an XML file is done through a separate"rule" file, making use of the Internationalization Tag Set standard(ITS, http://www.w3.org/TR/its20/). The currently supported ITSdata categories are: ‘Translate’, ‘Localization Note’,‘Elements Within Text’, and ‘Preserve Space’. In addition tothem, xgettext also recognizes the following extended datacategories:

Context

This data category associates msgctxt to the extracted text. Inthe global rule, the contextRule element contains the following:

  • A required selector attribute. It contains an absolute selectorthat selects the nodes to which this rule applies.
  • A required contextPointer attribute that contains a relativeselector pointing to a node that holds the msgctxt value.
  • An optional textPointer attribute that contains a relativeselector pointing to a node that holds the msgid value.
Escape Special Characters

This data category indicates whether the special XML characters(<, >, &, ") are escaped with entityreference. In the global rule, the escapeRule element containsthe following:

  • A required selector attribute. It contains an absolute selectorthat selects the nodes to which this rule applies.
  • A required escape attribute with the value yes or no.
Extended Preserve Space

This data category extends the standard ‘Preserve Space’ datacategory with the additional value ‘trim’. The value means toremove the leading and trailing whitespaces of the content, but not tonormalize whitespaces in the middle. In the global rule, thepreserveSpaceRule element contains the following:

  • A required selector attribute. It contains an absolute selectorthat selects the nodes to which this rule applies.
  • A required space attribute with the value default,preserve, or trim.

All those extended data categories can only be expressed with globalrules, and the rule elements have to have thehttps://www.gnu.org/s/gettext/ns/its/extensions/1.0 namespace.

Given the following XML document in a file messages.xml:

<?xml version="1.0"?>
<messages>
  <message>
    <p>A translatable string</p>
  </message>
  <message>
    <p translatable="no">A non-translatable string</p>
  </message>
</messages>

To extract the first text content ("A translatable string"), but not thesecond ("A non-translatable string"), the following ITS rules can be used:

<?xml version="1.0"?>
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
  <its:translateRule selector="/messages" translate="no"/>
  <its:translateRule selector="//message/p" translate="yes"/>

  <!-- If 'p' has an attribute 'translatable' with the value 'no', then
       the content is not translatable.  -->
  <its:translateRule selector="//message/p[@translatable = 'no']"
    translate="no"/>
</its:rules>

xgettext’ needs another file called "locating rule" to associatean ITS rule with an XML file. If the above ITS file is saved asmessages.its, the locating rule would look like:

<?xml version="1.0"?>
<locatingRules>
  <locatingRule name="Messages" pattern="*.xml">
    <documentRule localName="messages" target="messages.its"/>
  </locatingRule>
  <locatingRule name="Messages" pattern="*.msg" target="messages.its"/>
</locatingRules>

The locatingRule element must have a pattern attribute,which denotes either a literal file name or a wildcard pattern of theXML file7. The locatingRule element can have childdocumentRule element, which adds checks on the content of the XMLfile.

The first rule matches any file with the .xml file extension, butit only applies to XML files whose root element is ‘<messages>’.

The second rule indicates that the same ITS rule file are alsoapplicable to any file with the .msg file extension. Theoptional name attribute of locatingRule allows to chooserules by name, typically with xgettext’s -L option.

The associated ITS rule file is indicated by the target attributeof locatingRule or documentRule. If it is specified in adocumentRule element, the parent locatingRule shouldn’thave the target attribute.

Locating rule files must have the .loc file extension. Both ITSrule files and locating rule files must be installed in the$prefix/share/gettext/its directory. Once those files areproperly installed, xgettext can extract translatable stringsfrom the matching XML files.

15.6.6.1 Two Use-cases of Translated Strings in XML

For XML, there are two use-cases of translated strings. One is the casewhere the translated strings are directly consumed by programs, and theother is the case where the translated strings are merged back to theoriginal XML document. In the former case, special characters in theextracted strings shouldn’t be escaped, while they should in the lattercase. To control wheter to escape special characters, the ‘EscapeSpecial Characters’ data category can be used.

To merge the translations, the ‘msgfmt’ program can be used withthe option --xml. See msgfmt Invocation, for more detailsabout how one calls the ‘msgfmt’ program. ‘msgfmt’’s--xml option doesn’t perform character escaping, so translatedstrings can have arbitrary XML constructs, such as elements for markup.


16 Concluding Remarks

We would like to conclude this GNU gettext manual by presentingan history of the Translation Project so far. We finally givea few pointers for those who want to do further research or readingsabout Native Language Support matters.


Next: References, Previous: Conclusion, Up: Conclusion   [Contents][Index]

16.1 History of GNU gettext

Internationalization concerns and algorithms have been informallyand casually discussed for years in GNU, sometimes around GNUlibc, maybe around the incoming Hurd, or otherwise(nobody clearly remembers). And even then, when the work started forreal, this was somewhat independently of these previous discussions.

This all began in July 1994, when Patrick D’Cruze had the idea andinitiative of internationalizing version 3.9.2 of GNU fileutils.He then asked Jim Meyering, the maintainer, how to get those changesfolded into an official release. That first draft was full of#ifdefs and somewhat disconcerting, and Jim wanted to findnicer ways. Patrick and Jim shared some tries and experimentationsin this area. Then, feeling that this might eventually have a deeperimpact on GNU, Jim wanted to know what standards were, and contactedRichard Stallman, who very quickly and verbally described an overalldesign for what was meant to become glocale, at that time.

Jim implemented glocale and got a lot of exhausting feedbackfrom Patrick and Richard, of course, but also from Mitchum DSouza(who wrote a catgets-like package), Roland McGrath, maybe DavidMacKenzie, François Pinard, and Paul Eggert, all pushing andpulling in various directions, not always compatible, to the extentthat after a couple of test releases, glocale was torn apart.In particular, Paul Eggert – always keeping an eye on developmentsin Solaris – advocated the use of the gettext API overglocale’s catgets-based API.

While Jim took some distance and time and became dad for a secondtime, Roland wanted to get GNU libc internationalized, andgot Ulrich Drepper involved in that project. Instead of startingfrom glocale, Ulrich rewrote something from scratch, butmore conforming to the set of guidelines who emerged out of theglocale effort. Then, Ulrich got people from the previousforum to involve themselves into this new project, and the switchfrom glocale to what was first named msgutils, renamednlsutils, and later gettext, became officially acceptedby Richard in May 1995 or so.

Let’s summarize by saying that Ulrich Drepper wrote GNU gettextin April 1995. The first official release of the package, includingPO mode, occurred in July 1995, and was numbered 0.7. Other peoplecontributed to the effort by providing a discussion forum aroundUlrich, writing little pieces of code, or testing. These are quotedin the THANKS file which comes with the GNU gettextdistribution.

While this was being done, François adapted half a dozen ofGNU packages to glocale first, then later to gettext,putting them in pretest, so providing along the way an effectiveuser environment for fine tuning the evolving tools. He also tookthe responsibility of organizing and coordinating the TranslationProject. After nearly a year of informal exchanges between people frommany countries, translator teams started to exist in May 1995, throughthe creation and support by Patrick D’Cruze of twenty unmoderatedmailing lists for that many native languages, and two moderatedlists: one for reaching all teams at once, the other for reachingall willing maintainers of internationalized free software packages.

François also wrote PO mode in June 1995 with the collaborationof Greg McGary, as a kind of contribution to Ulrich’s package.He also gave a hand with the GNU gettext Texinfo manual.

In 1997, Ulrich Drepper released the GNU libc 2.0, which included thegettext, textdomain and bindtextdomain functions.

In 2000, Ulrich Drepper added plural form handling (the ngettextfunction) to GNU libc. Later, in 2001, he released GNU libc 2.2.x,which is the first free C library with full internationalization support.

Ulrich being quite busy in his role of General Maintainer of GNU libc,he handed over the GNU gettext maintenance to Bruno Haible in2000. Bruno added the plural form handling to the tools as well, addedsupport for UTF-8 and CJK locales, and wrote a few new tools formanipulating PO files.


Previous: History, Up: Conclusion   [Contents][Index]

16.2 Related Readings

NOTE: This documentation section is outdated and needs to berevised.

Eugene H. Dorr ([email protected]) maintains an interestingbibliography on internationalization matters, calledInternationalization Reference List, which is available as:

ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt

Michael Gschwind ([email protected]) maintains aFrequently Asked Questions (FAQ) list, entitled Programming forInternationalisation. This FAQ discusses writing programs whichcan handle different language conventions, character sets, etc.;and is applicable to all character set encodings, with particularemphasis on ISO 8859-1. It is regularly published in Usenetgroups comp.unix.questions, comp.std.internat,comp.software.international, comp.lang.c,comp.windows.x, comp.std.c, comp.answersand news.answers. The home location of this document is:

ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming

Patrick D’Cruze ([email protected]) wrote a tutorial about NLSmatters, and Jochen Hein ([email protected]) tookover the responsibility of maintaining it. It may be found as:

ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
     ...locale-tutorial-0.8.txt.gz

This site is mirrored in:

ftp://ftp.ibp.fr/pub/linux/sunsite/

A French version of the same tutorial should be findable at:

ftp://ftp.ibp.fr/pub/linux/french/docs/

together with French translations of many Linux-related documents.


Next: Country Codes, Previous: Conclusion, Up: Top   [Contents][Index]

Appendix A Language Codes

The ISO 639 standard defines two-letter codes for many languages, andthree-letter codes for more rarely used languages.All abbreviations for languages used in the Translation Project shouldcome from this standard.


A.1 Usual Language Codes

For the commonly used languages, the ISO 639-1 standard defines two-lettercodes.

aa

Afar.

ab

Abkhazian.

ae

Avestan.

af

Afrikaans.

ak

Akan.

am

Amharic.

an

Aragonese.

ar

Arabic.

as

Assamese.

av

Avaric.

ay

Aymara.

az

Azerbaijani.

ba

Bashkir.

be

Belarusian.

bg

Bulgarian.

bh

Bihari.

bi

Bislama.

bm

Bambara.

bn

Bengali; Bangla.

bo

Tibetan.

br

Breton.

bs

Bosnian.

ca

Catalan.

ce

Chechen.

ch

Chamorro.

co

Corsican.

cr

Cree.

cs

Czech.

cu

Church Slavic.

cv

Chuvash.

cy

Welsh.

da

Danish.

de

German.

dv

Divehi; Maldivian.

dz

Dzongkha; Bhutani.

ee

Éwé.

el

Greek.

en

English.

eo

Esperanto.

es

Spanish.

et

Estonian.

eu

Basque.

fa

Persian.

ff

Fulah.

fi

Finnish.

fj

Fijian; Fiji.

fo

Faroese.

fr

French.

fy

Western Frisian.

ga

Irish.

gd

Scottish Gaelic.

gl

Galician.

gn

Guarani.

gu

Gujarati.

gv

Manx.

ha

Hausa.

he

Hebrew (formerly iw).

hi

Hindi.

ho

Hiri Motu.

hr

Croatian.

ht

Haitian; Haitian Creole.

hu

Hungarian.

hy

Armenian.

hz

Herero.

ia

Interlingua.

id

Indonesian (formerly in).

ie

Interlingue; Occidental.

ig

Igbo.

ii

Sichuan Yi; Nuosu.

ik

Inupiak; Inupiaq.

io

Ido.

is

Icelandic.

it

Italian.

iu

Inuktitut.

ja

Japanese.

jv

Javanese.

ka

Georgian.

kg

Kongo.

ki

Kikuyu; Gikuyu.

kj

Kuanyama; Kwanyama.

kk

Kazakh.

kl

Kalaallisut; Greenlandic.

km

Central Khmer; Cambodian.

kn

Kannada.

ko

Korean.

kr

Kanuri.

ks

Kashmiri.

ku

Kurdish.

kv

Komi.

kw

Cornish.

ky

Kirghiz.

la

Latin.

lb

Letzeburgesch; Luxembourgish.

lg

Ganda.

li

Limburgish; Limburger; Limburgan.

ln

Lingala.

lo

Lao; Laotian.

lt

Lithuanian.

lu

Luba-Katanga.

lv

Latvian; Lettish.

mg

Malagasy.

mh

Marshallese.

mi

Maori.

mk

Macedonian.

ml

Malayalam.

mn

Mongolian.

mo

Moldavian.

mr

Marathi.

ms

Malay.

mt

Maltese.

my

Burmese.

na

Nauru.

nb

Norwegian Bokmål.

nd

Ndebele, North.

ne

Nepali.

ng

Ndonga.

nl

Dutch.

nn

Norwegian Nynorsk.

no

Norwegian.

nr

Ndebele, South.

nv

Navajo; Navaho.

ny

Chichewa; Nyanja.

oc

Occitan; Provençal.

oj

Ojibwa.

om

(Afan) Oromo.

or

Oriya.

os

Ossetian; Ossetic.

pa

Panjabi; Punjabi.

pi

Pali.

pl

Polish.

ps

Pashto; Pushto.

pt

Portuguese.

qu

Quechua.

rm

Romansh.

rn

Rundi; Kirundi.

ro

Romanian.

ru

Russian.

rw

Kinyarwanda.

sa

Sanskrit.

sc

Sardinian.

sd

Sindhi.

se

Northern Sami.

sg

Sango; Sangro.

si

Sinhala; Sinhalese.

sk

Slovak.

sl

Slovenian.

sm

Samoan.

sn

Shona.

so

Somali.

sq

Albanian.

sr

Serbian.

ss

Swati; Siswati.

st

Sesotho; Sotho, Southern.

su

Sundanese.

sv

Swedish.

sw

Swahili.

ta

Tamil.

te

Telugu.

tg

Tajik.

th

Thai.

ti

Tigrinya.

tk

Turkmen.

tl

Tagalog.

tn

Tswana; Setswana.

to

Tonga.

tr

Turkish.

ts

Tsonga.

tt

Tatar.

tw

Twi.

ty

Tahitian.

ug

Uighur.

uk

Ukrainian.

ur

Urdu.

uz

Uzbek.

ve

Venda.

vi

Vietnamese.

vo

Volapük; Volapuk.

wa

Walloon.

wo

Wolof.

xh

Xhosa.

yi

Yiddish (formerly ji).

yo

Yoruba.

za

Zhuang.

zh

Chinese.

zu

Zulu.


A.2 Rare Language Codes

For rarely used languages, the ISO 639-2 standard defines three-lettercodes. Here is the current list, reduced to only living languages with at leastone million of speakers.

ace

Achinese.

awa

Awadhi.

bal

Baluchi.

ban

Balinese.

bej

Beja; Bedawiyet.

bem

Bemba.

bho

Bhojpuri.

bik

Bikol.

bin

Bini; Edo.

bug

Buginese.

ceb

Cebuano.

din

Dinka.

doi

Dogri.

fil

Filipino; Pilipino.

fon

Fon.

gon

Gondi.

gsw

Swiss German; Alemannic; Alsatian.

hil

Hiligaynon.

hmn

Hmong.

ilo

Iloko.

kab

Kabyle.

kam

Kamba.

kbd

Kabardian.

kmb

Kimbundu.

kok

Konkani.

kru

Kurukh.

lua

Luba-Lulua.

luo

Luo (Kenya and Tanzania).

mad

Madurese.

mag

Magahi.

mai

Maithili.

mak

Makasar.

man

Mandingo.

men

Mende.

min

Minangkabau.

mni

Manipuri.

mos

Mossi.

mwr

Marwari.

nap

Neapolitan.

nso

Pedi; Sepedi; Northern Sotho.

nym

Nyamwezi.

nyn

Nyankole.

pag

Pangasinan.

pam

Pampanga; Kapampangan.

raj

Rajasthani.

sas

Sasak.

sat

Santali.

scn

Sicilian.

shn

Shan.

sid

Sidamo.

srr

Serer.

suk

Sukuma.

sus

Susu.

tem

Timne.

tiv

Tiv.

tum

Tumbuka.

umb

Umbundu.

wal

Walamo.

war

Waray.

yao

Yao.


Next: Licenses, Previous: Language Codes, Up: Top   [Contents][Index]

Appendix B Country Codes

The ISO 3166 standard defines two character codes for many countriesand territories. All abbreviations for countries used in the TranslationProject should come from this standard.

AD

Andorra.

AE

United Arab Emirates.

AF

Afghanistan.

AG

Antigua and Barbuda.

AI

Anguilla.

AL

Albania.

AM

Armenia.

AO

Angola.

AQ

Antarctica.

AR

Argentina.

AS

American Samoa.

AT

Austria.

AU

Australia.

AW

Aruba.

AX

Aaland Islands.

AZ

Azerbaijan.

BA

Bosnia and Herzegovina.

BB

Barbados.

BD

Bangladesh.

BE

Belgium.

BF

Burkina Faso.

BG

Bulgaria.

BH

Bahrain.

BI

Burundi.

BJ

Benin.

BL

Saint Barthelemy.

BM

Bermuda.

BN

Brunei Darussalam.

BO

Bolivia, Plurinational State of.

BQ

Bonaire, Sint Eustatius and Saba.

BR

Brazil.

BS

Bahamas.

BT

Bhutan.

BV

Bouvet Island.

BW

Botswana.

BY

Belarus.

BZ

Belize.

CA

Canada.

CC

Cocos (Keeling) Islands.

CD

Congo, The Democratic Republic of the.

CF

Central African Republic.

CG

Congo.

CH

Switzerland.

CI

Côte d’Ivoire.

CK

Cook Islands.

CL

Chile.

CM

Cameroon.

CN

China.

CO

Colombia.

CR

Costa Rica.

CU

Cuba.

CV

Cape Verde.

CW

Curaçao.

CX

Christmas Island.

CY

Cyprus.

CZ

Czech Republic.

DE

Germany.

DJ

Djibouti.

DK

Denmark.

DM

Dominica.

DO

Dominican Republic.

DZ

Algeria.

EC

Ecuador.

EE

Estonia.

EG

Egypt.

EH

Western Sahara.

ER

Eritrea.

ES

Spain.

ET

Ethiopia.

FI

Finland.

FJ

Fiji.

FK

Falkland Islands (Malvinas).

FM

Micronesia, Federated States of.

FO

Faroe Islands.

FR

France.

GA

Gabon.

GB

United Kingdom.

GD

Grenada.

GE

Georgia.

GF

French Guiana.

GG

Guernsey.

GH

Ghana.

GI

Gibraltar.

GL

Greenland.

GM

Gambia.

GN

Guinea.

GP

Guadeloupe.

GQ

Equatorial Guinea.

GR

Greece.

GS

South Georgia and the South Sandwich Islands.

GT

Guatemala.

GU

Guam.

GW

Guinea-Bissau.

GY

Guyana.

HK

Hong Kong.

HM

Heard Island and McDonald Islands.

HN

Honduras.

HR

Croatia.

HT

Haiti.

HU

Hungary.

ID

Indonesia.

IE

Ireland.

IL

Israel.

IM

Isle of Man.

IN

India.

IO

British Indian Ocean Territory.

IQ

Iraq.

IR

Iran, Islamic Republic of.

IS

Iceland.

IT

Italy.

JE

Jersey.

JM

Jamaica.

JO

Jordan.

JP

Japan.

KE

Kenya.

KG

Kyrgyzstan.

KH

Cambodia.

KI

Kiribati.

KM

Comoros.

KN

Saint Kitts and Nevis.

KP

Korea, Democratic People’s Republic of.

KR

Korea, Republic of.

KW

Kuwait.

KY

Cayman Islands.

KZ

Kazakhstan.

LA

Lao People’s Democratic Republic.

LB

Lebanon.

LC

Saint Lucia.

LI

Liechtenstein.

LK

Sri Lanka.

LR

Liberia.

LS

Lesotho.

LT

Lithuania.

LU

Luxembourg.

LV

Latvia.

LY

Libya.

MA

Morocco.

MC

Monaco.

MD

Moldova, Republic of.

ME

Montenegro.

MF

Saint Martin (French part).

MG

Madagascar.

MH

Marshall Islands.

MK

Macedonia, The Former Yugoslav Republic of.

ML

Mali.

MM

Myanmar.

MN

Mongolia.

MO

Macao.

MP

Northern Mariana Islands.

MQ

Martinique.

MR

Mauritania.

MS

Montserrat.

MT

Malta.

MU

Mauritius.

MV

Maldives.

MW

Malawi.

MX

Mexico.

MY

Malaysia.

MZ

Mozambique.

NA

Namibia.

NC

New Caledonia.

NE

Niger.

NF

Norfolk Island.

NG

Nigeria.

NI

Nicaragua.

NL

Netherlands.

NO

Norway.

NP

Nepal.

NR

Nauru.

NU

Niue.

NZ

New Zealand.

OM

Oman.

PA

Panama.

PE

Peru.

PF

French Polynesia.

PG

Papua New Guinea.

PH

Philippines.

PK

Pakistan.

PL

Poland.

PM

Saint Pierre and Miquelon.

PN

Pitcairn.

PR

Puerto Rico.

PS

Palestine, State of.

PT

Portugal.

PW

Palau.

PY

Paraguay.

QA

Qatar.

RE

Reunion.

RO

Romania.

RS

Serbia.

RU

Russian Federation.

RW

Rwanda.

SA

Saudi Arabia.

SB

Solomon Islands.

SC

Seychelles.

SD

Sudan.

SE

Sweden.

SG

Singapore.

SH

Saint Helena, Ascension and Tristan da Cunha.

SI

Slovenia.

SJ

Svalbard and Jan Mayen.

SK

Slovakia.

SL

Sierra Leone.

SM

San Marino.

SN

Senegal.

SO

Somalia.

SR

Suriname.

SS

South Sudan.

ST

Sao Tome and Principe.

SV

El Salvador.

SX

Sint Maarten (Dutch part).

SY

Syrian Arab Republic.

SZ

Swaziland.

TC

Turks and Caicos Islands.

TD

Chad.

TF

French Southern Territories.

TG

Togo.

TH

Thailand.

TJ

Tajikistan.

TK

Tokelau.

TL

Timor-Leste.

TM

Turkmenistan.

TN

Tunisia.

TO

Tonga.

TR

Turkey.

TT

Trinidad and Tobago.

TV

Tuvalu.

TW

Taiwan, Province of China.

TZ

Tanzania, United Republic of.

UA

Ukraine.

UG

Uganda.

UM

United States Minor Outlying Islands.

US

United States.

UY

Uruguay.

UZ

Uzbekistan.

VA

Holy See (Vatican City State).

VC

Saint Vincent and the Grenadines.

VE

Venezuela, Bolivarian Republic of.

VG

Virgin Islands, British.

VI

Virgin Islands, U.S..

VN

Viet Nam.

VU

Vanuatu.

WF

Wallis and Futuna.

WS

Samoa.

YE

Yemen.

YT

Mayotte.

ZA

South Africa.

ZM

Zambia.

ZW

Zimbabwe.


Next: Program Index, Previous: Country Codes, Up: Top   [Contents][Index]

Appendix C Licenses

The files of this package are covered by the licenses indicated in eachparticular file or directory. Here is a summary:

  • The libintl and libasprintf libraries are covered by theGNU Lesser General Public License (LGPL). A copy of the license is included in GNU LGPL.
  • The executable programs of this package and the libgettextpo libraryare covered by the GNU General Public License (GPL).A copy of the license is included in GNU GPL.
  • This manual is free documentation. It is dually licensed under theGNU FDL and the GNU GPL. This means that you can redistribute thismanual under either of these two licenses, at your choice.
    This manual is covered by the GNU FDL. Permission is granted to copy,distribute and/or modify this document under the terms of theGNU Free Documentation License (FDL), either version 1.2 of theLicense, or (at your option) any later version published by theFree Software Foundation (FSF); with no Invariant Sections, with noFront-Cover Text, and with no Back-Cover Texts.A copy of the license is included in GNU FDL.
    This manual is covered by the GNU GPL. You can redistribute it and/ormodify it under the terms of the GNU General Public License (GPL), eitherversion 2 of the License, or (at your option) any later version publishedby the Free Software Foundation (FSF).A copy of the license is included in GNU GPL.

Next: GNU LGPL, Previous: Licenses, Up: Licenses   [Contents][Index]

C.1 GNU GENERAL PUBLIC LICENSE

Version 2, June 1991
Copyright  1989, 1991 Free Software Foundation, Inc.
51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA

Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

Preamble

The licenses for most software are designed to take away yourfreedom to share and change it. By contrast, the GNU General PublicLicense is intended to guarantee your freedom to share and change freesoftware—to make sure the software is free for all its users. ThisGeneral Public License applies to most of the Free SoftwareFoundation’s software and to any other program whose authors commit tousing it. (Some other Free Software Foundation software is covered bythe GNU Lesser General Public License instead.) You can apply it toyour programs, too.

When we speak of free software, we are referring to freedom, notprice. Our General Public Licenses are designed to make sure that youhave the freedom to distribute copies of free software (and charge forthis service if you wish), that you receive source code or can get itif you want it, that you can change the software or use pieces of itin new free programs; and that you know you can do these things.

To protect your rights, we need to make restrictions that forbidanyone to deny you these rights or to ask you to surrender the rights.These restrictions translate to certain responsibilities for you if youdistribute copies of the software, or if you modify it.

For example, if you distribute copies of such a program, whethergratis or for a fee, you must give the recipients all the rights thatyou have. You must make sure that they, too, receive or can get thesource code. And you must show them these terms so they know theirrights.

We protect your rights with two steps: (1) copyright the software, and(2) offer you this license which gives you legal permission to copy,distribute and/or modify the software.

Also, for each author’s protection and ours, we want to make certainthat everyone understands that there is no warranty for this freesoftware. If the software is modified by someone else and passed on, wewant its recipients to know that what they have is not the original, sothat any problems introduced by others will not reflect on the originalauthors’ reputations.

Finally, any free program is threatened constantly by softwarepatents. We wish to avoid the danger that redistributors of a freeprogram will individually obtain patent licenses, in effect making theprogram proprietary. To prevent this, we have made it clear that anypatent must be licensed for everyone’s free use or not licensed at all.

The precise terms and conditions for copying, distribution andmodification follow.

TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

  1. This License applies to any program or other work which containsa notice placed by the copyright holder saying it may be distributedunder the terms of this General Public License. The “Program”, below,refers to any such program or work, and a “work based on the Program”means either the Program or any derivative work under copyright law:that is to say, a work containing the Program or a portion of it,either verbatim or with modifications and/or translated into anotherlanguage. (Hereinafter, translation is included without limitation inthe term “modification”.) Each licensee is addressed as “you”.

    Activities other than copying, distribution and modification are notcovered by this License; they are outside its scope. The act ofrunning the Program is not restricted, and the output from the Programis covered only if its contents constitute a work based on theProgram (independent of having been made by running the Program).Whether that is true depends on what the Program does.

  2. You may copy and distribute verbatim copies of the Program’ssource code as you receive it, in any medium, provided that youconspicuously and appropriately publish on each copy an appropriatecopyright notice and disclaimer of warranty; keep intact all thenotices that refer to this License and to the absence of any warranty;and give any other recipients of the Program a copy of this Licensealong with the Program.

    You may charge a fee for the physical act of transferring a copy, andyou may at your option offer warranty protection in exchange for a fee.

  3. You may modify your copy or copies of the Program or any portionof it, thus forming a work based on the Program, and copy anddistribute such modifications or work under the terms of Section 1above, provided that you also meet all of these conditions:
    1. You must cause the modified files to carry prominent noticesstating that you changed the files and the date of any change.
    2. You must cause any work that you distribute or publish, that inwhole or in part contains or is derived from the Program or anypart thereof, to be licensed as a whole at no charge to all thirdparties under the terms of this License.
    3. If the modified program normally reads commands interactivelywhen run, you must cause it, when started running for suchinteractive use in the most ordinary way, to print or display anannouncement including an appropriate copyright notice and anotice that there is no warranty (or else, saying that you providea warranty) and that users may redistribute the program underthese conditions, and telling the user how to view a copy of thisLicense. (Exception: if the Program itself is interactive butdoes not normally print such an announcement, your work based onthe Program is not required to print an announcement.)

    These requirements apply to the modified work as a whole. Ifidentifiable sections of that work are not derived from the Program,and can be reasonably considered independent and separate works inthemselves, then this License, and its terms, do not apply to thosesections when you distribute them as separate works. But when youdistribute the same sections as part of a whole which is a work basedon the Program, the distribution of the whole must be on the terms ofthis License, whose permissions for other licensees extend to theentire whole, and thus to each and every part regardless of who wrote it.

    Thus, it is not the intent of this section to claim rights or contestyour rights to work written entirely by you; rather, the intent is toexercise the right to control the distribution of derivative orcollective works based on the Program.

    In addition, mere aggregation of another work not based on the Programwith the Program (or with a work based on the Program) on a volume ofa storage or distribution medium does not bring the other work underthe scope of this License.

  4. You may copy and distribute the Program (or a work based on it,under Section 2) in object code or executable form under the terms ofSections 1 and 2 above provided that you also do one of the following:
    1. Accompany it with the complete corresponding machine-readablesource code, which must be distributed under the terms of Sections1 and 2 above on a medium customarily used for software interchange; or,
    2. Accompany it with a written offer, valid for at least threeyears, to give any third party, for a charge no more than yourcost of physically performing source distribution, a completemachine-readable copy of the corresponding source code, to bedistributed under the terms of Sections 1 and 2 above on a mediumcustomarily used for software interchange; or,
    3. Accompany it with the information you received as to the offerto distribute corresponding source code. (This alternative isallowed only for noncommercial distribution and only if youreceived the program in object code or executable form with suchan offer, in accord with Subsection b above.)

    The source code for a work means the preferred form of the work formaking modifications to it. For an executable work, complete sourcecode means all the source code for all modules it contains, plus anyassociated interface definition files, plus the scripts used tocontrol compilation and installation of the executable. However, as aspecial exception, the source code distributed need not includeanything that is normally distributed (in either source or binaryform) with the major components (compiler, kernel, and so on) of theoperating system on which the executable runs, unless that componentitself accompanies the executable.

    If distribution of executable or object code is made by offeringaccess to copy from a designated place, then offering equivalentaccess to copy the source code from the same place counts asdistribution of the source code, even though third parties are notcompelled to copy the source along with the object code.

  5. You may not copy, modify, sublicense, or distribute the Programexcept as expressly provided under this License. Any attemptotherwise to copy, modify, sublicense or distribute the Program isvoid, and will automatically terminate your rights under this License.However, parties who have received copies, or rights, from you underthis License will not have their licenses terminated so long as suchparties remain in full compliance.
  6. You are not required to accept this License, since you have notsigned it. However, nothing else grants you permission to modify ordistribute the Program or its derivative works. These actions areprohibited by law if you do not accept this License. Therefore, bymodifying or distributing the Program (or any work based on theProgram), you indicate your acceptance of this License to do so, andall its terms and conditions for copying, distributing or modifyingthe Program or works based on it.
  7. Each time you redistribute the Program (or any work based on theProgram), the recipient automatically receives a license from theoriginal licensor to copy, distribute or modify the Program subject tothese terms and conditions. You may not impose any furtherrestrictions on the recipients’ exercise of the rights granted herein.You are not responsible for enforcing compliance by third parties tothis License.
  8. If, as a consequence of a court judgment or allegation of patentinfringement or for any other reason (not limited to patent issues),conditions are imposed on you (whether by court order, agreement orotherwise) that contradict the conditions of this License, they do notexcuse you from the conditions of this License. If you cannotdistribute so as to satisfy simultaneously your obligations under thisLicense and any other pertinent obligations, then as a consequence youmay not distribute the Program at all. For example, if a patentlicense would not permit royalty-free redistribution of the Program byall those who receive copies directly or indirectly through you, thenthe only way you could satisfy both it and this License would be torefrain entirely from distribution of the Program.

    If any portion of this section is held invalid or unenforceable underany particular circumstance, the balance of the section is intended toapply and the section as a whole is intended to apply in othercircumstances.

    It is not the purpose of this section to induce you to infringe anypatents or other property right claims or to contest validity of anysuch claims; this section has the sole purpose of protecting theintegrity of the free software distribution system, which isimplemented by public license practices. Many people have madegenerous contributions to the wide range of software distributedthrough that system in reliance on consistent application of thatsystem; it is up to the author/donor to decide if he or she is willingto distribute software through any other system and a licensee cannotimpose that choice.

    This section is intended to make thoroughly clear what is believed tobe a consequence of the rest of this License.

  9. If the distribution and/or use of the Program is restricted incertain countries either by patents or by copyrighted interfaces, theoriginal copyright holder who places the Program under this Licensemay add an explicit geographical distribution limitation excludingthose countries, so that distribution is permitted only in or amongcountries not thus excluded. In such case, this License incorporatesthe limitation as if written in the body of this License.
  10. The Free Software Foundation may publish revised and/or new versionsof the General Public License from time to time. Such new versions willbe similar in spirit to the present version, but may differ in detail toaddress new problems or concerns.

    Each version is given a distinguishing version number. If the Programspecifies a version number of this License which applies to it and “anylater version”, you have the option of following the terms and conditionseither of that version or of any later version published by the FreeSoftware Foundation. If the Program does not specify a version number ofthis License, you may choose any version ever published by the Free SoftwareFoundation.

  11. If you wish to incorporate parts of the Program into other freeprograms whose distribution conditions are different, write to the authorto ask for permission. For software which is copyrighted by the FreeSoftware Foundation, write to the Free Software Foundation; we sometimesmake exceptions for this. Our decision will be guided by the two goalsof preserving the free status of all derivatives of our free software andof promoting the sharing and reuse of software generally.
  12. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTYFOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHENOTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIESPROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSEDOR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OFMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK ASTO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THEPROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,REPAIR OR CORRECTION.
  13. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITINGWILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/ORREDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISINGOUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITEDTO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BYYOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHERPROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THEPOSSIBILITY OF SUCH DAMAGES.

Appendix: How to Apply These Terms to Your New Programs

If you develop a new program, and you want it to be of the greatestpossible use to the public, the best way to achieve this is to make itfree software which everyone can redistribute and change under these terms.

To do so, attach the following notices to the program. It is safestto attach them to the start of each source file to most effectivelyconvey the exclusion of warranty; and each file should have at leastthe “copyright” line and a pointer to where the full notice is found.

one line to give the program's name and a brief idea of what it does.
Copyright (C) yyyy  name of author

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

Also add information on how to contact you by electronic and paper mail.

If the program is interactive, make it output a short notice like thiswhen it starts in an interactive mode:

Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.

The hypothetical commands ‘show w’ and ‘show c’ should showthe appropriate parts of the General Public License. Of course, thecommands you use may be called something other than ‘show w’ and‘show c’; they could even be mouse-clicks or menu items—whateversuits your program.

You should also get your employer (if you work as a programmer) or yourschool, if any, to sign a “copyright disclaimer” for the program, ifnecessary. Here is a sample; alter the names:

Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.

signature of Ty Coon, 1 April 1989
Ty Coon, President of Vice

This General Public License does not permit incorporating your program intoproprietary programs. If your program is a subroutine library, you mayconsider it more useful to permit linking proprietary applications with thelibrary. If this is what you want to do, use the GNU Lesser GeneralPublic License instead of this License.


Next: GNU FDL, Previous: GNU GPL, Up: Licenses   [Contents][Index]

C.2 GNU LESSER GENERAL PUBLIC LICENSE

Version 2.1, February 1999
Copyright  1991, 1999 Free Software Foundation, Inc.
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA

Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

[This is the first released version of the Lesser GPL.  It also counts
as the successor of the GNU Library Public License, version 2, hence the
version number 2.1.]
Preamble

The licenses for most software are designed to take away yourfreedom to share and change it. By contrast, the GNU General PublicLicenses are intended to guarantee your freedom to share and changefree software—to make sure the software is free for all its users.

This license, the Lesser General Public License, applies to somespecially designated software—typically libraries—of the FreeSoftware Foundation and other authors who decide to use it. You can useit too, but we suggest you first think carefully about whether thislicense or the ordinary General Public License is the better strategy touse in any particular case, based on the explanations below.

When we speak of free software, we are referring to freedom of use,not price. Our General Public Licenses are designed to make sure thatyou have the freedom to distribute copies of free software (and chargefor this service if you wish); that you receive source code or can getit if you want it; that you can change the software and use pieces of itin new free programs; and that you are informed that you can do thesethings.

To protect your rights, we need to make restrictions that forbiddistributors to deny you these rights or to ask you to surrender theserights. These restrictions translate to certain responsibilities foryou if you distribute copies of the library or if you modify it.

For example, if you distribute copies of the library, whether gratisor for a fee, you must give the recipients all the rights that we gaveyou. You must make sure that they, too, receive or can get the sourcecode. If you link other code with the library, you must providecomplete object files to the recipients, so that they can relink themwith the library after making changes to the library and recompilingit. And you must show them these terms so they know their rights.

We protect your rights with a two-step method: (1) we copyright thelibrary, and (2) we offer you this license, which gives you legalpermission to copy, distribute and/or modify the library.

To protect each distributor, we want to make it very clear thatthere is no warranty for the free library. Also, if the library ismodified by someone else and passed on, the recipients should knowthat what they have is not the original version, so that the originalauthor’s reputation will not be affected by problems that might beintroduced by others.

Finally, software patents pose a constant threat to the existence ofany free program. We wish to make sure that a company cannoteffectively restrict the users of a free program by obtaining arestrictive license from a patent holder. Therefore, we insist thatany patent license obtained for a version of the library must beconsistent with the full freedom of use specified in this license.

Most GNU software, including some libraries, is covered by theordinary GNU General Public License. This license, the GNU LesserGeneral Public License, applies to certain designated libraries, andis quite different from the ordinary General Public License. We usethis license for certain libraries in order to permit linking thoselibraries into non-free programs.

When a program is linked with a library, whether statically or usinga shared library, the combination of the two is legally speaking acombined work, a derivative of the original library. The ordinaryGeneral Public License therefore permits such linking only if theentire combination fits its criteria of freedom. The Lesser GeneralPublic License permits more lax criteria for linking other code withthe library.

We call this license the Lesser General Public License because itdoes Less to protect the user’s freedom than the ordinary GeneralPublic License. It also provides other free software developers Lessof an advantage over competing non-free programs. These disadvantagesare the reason we use the ordinary General Public License for manylibraries. However, the Lesser license provides advantages in certainspecial circumstances.

For example, on rare occasions, there may be a special need toencourage the widest possible use of a certain library, so that it becomesa de-facto standard. To achieve this, non-free programs must beallowed to use the library. A more frequent case is that a freelibrary does the same job as widely used non-free libraries. In thiscase, there is little to gain by limiting the free library to freesoftware only, so we use the Lesser General Public License.

In other cases, permission to use a particular library in non-freeprograms enables a greater number of people to use a large body offree software. For example, permission to use the GNU C Library innon-free programs enables many more people to use the whole GNUoperating system, as well as its variant, the GNU/Linux operatingsystem.

Although the Lesser General Public License is Less protective of theusers’ freedom, it does ensure that the user of a program that islinked with the Library has the freedom and the wherewithal to runthat program using a modified version of the Library.

The precise terms and conditions for copying, distribution andmodification follow. Pay close attention to the difference between a“work based on the library” and a “work that uses the library”. Theformer contains code derived from the library, whereas the latter mustbe combined with the library in order to run.

TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
  1. This License Agreement applies to any software library or other programwhich contains a notice placed by the copyright holder or otherauthorized party saying it may be distributed under the terms of thisLesser General Public License (also called “this License”). Eachlicensee is addressed as “you”.

    A “library” means a collection of software functions and/or dataprepared so as to be conveniently linked with application programs(which use some of those functions and data) to form executables.

    The “Library”, below, refers to any such software library or workwhich has been distributed under these terms. A “work based on theLibrary” means either the Library or any derivative work undercopyright law: that is to say, a work containing the Library or aportion of it, either verbatim or with modifications and/or translatedstraightforwardly into another language. (Hereinafter, translation isincluded without limitation in the term “modification”.)

    “Source code” for a work means the preferred form of the work formaking modifications to it. For a library, complete source code meansall the source code for all modules it contains, plus any associatedinterface definition files, plus the scripts used to control compilationand installation of the library.

    Activities other than copying, distribution and modification are notcovered by this License; they are outside its scope. The act ofrunning a program using the Library is not restricted, and output fromsuch a program is covered only if its contents constitute a work basedon the Library (independent of the use of the Library in a tool forwriting it). Whether that is true depends on what the Library doesand what the program that uses the Library does.

  2. You may copy and distribute verbatim copies of the Library’scomplete source code as you receive it, in any medium, provided thatyou conspicuously and appropriately publish on each copy anappropriate copyright notice and disclaimer of warranty; keep intactall the notices that refer to this License and to the absence of anywarranty; and distribute a copy of this License along with theLibrary.

    You may charge a fee for the physical act of transferring a copy,and you may at your option offer warranty protection in exchange for afee.

  3. You may modify your copy or copies of the Library or any portionof it, thus forming a work based on the Library, and copy anddistribute such modifications or work under the terms of Section 1above, provided that you also meet all of these conditions:
    1. The modified work must itself be a software library.
    2. You must cause the files modified to carry prominent noticesstating that you changed the files and the date of any change.
    3. You must cause the whole of the work to be licensed at nocharge to all third parties under the terms of this License.
    4. If a facility in the modified Library refers to a function or atable of data to be supplied by an application program that usesthe facility, other than as an argument passed when the facilityis invoked, then you must make a good faith effort to ensure that,in the event an application does not supply such function ortable, the facility still operates, and performs whatever part ofits purpose remains meaningful.

      (For example, a function in a library to compute square roots hasa purpose that is entirely well-defined independent of theapplication. Therefore, Subsection 2d requires that anyapplication-supplied function or table used by this function mustbe optional: if the application does not supply it, the squareroot function must still compute square roots.)

    These requirements apply to the modified work as a whole. Ifidentifiable sections of that work are not derived from the Library,and can be reasonably considered independent and separate works inthemselves, then this License, and its terms, do not apply to thosesections when you distribute them as separate works. But when youdistribute the same sections as part of a whole which is a work basedon the Library, the distribution of the whole must be on the terms ofthis License, whose permissions for other licensees extend to theentire whole, and thus to each and every part regardless of who wroteit.

    Thus, it is not the intent of this section to claim rights or contestyour rights to work written entirely by you; rather, the intent is toexercise the right to control the distribution of derivative orcollective works based on the Library.

    In addition, mere aggregation of another work not based on the Librarywith the Library (or with a work based on the Library) on a volume ofa storage or distribution medium does not bring the other work underthe scope of this License.

  4. You may opt to apply the terms of the ordinary GNU General PublicLicense instead of this License to a given copy of the Library. To dothis, you must alter all the notices that refer to this License, sothat they refer to the ordinary GNU General Public License, version 2,instead of to this License. (If a newer version than version 2 of theordinary GNU General Public License has appeared, then you can specifythat version instead if you wish.) Do not make any other change inthese notices.

    Once this change is made in a given copy, it is irreversible forthat copy, so the ordinary GNU General Public License applies to allsubsequent copies and derivative works made from that copy.

    This option is useful when you wish to copy part of the code ofthe Library into a program that is not a library.

  5. You may copy and distribute the Library (or a portion orderivative of it, under Section 2) in object code or executable formunder the terms of Sections 1 and 2 above provided that you accompanyit with the complete corresponding machine-readable source code, whichmust be distributed under the terms of Sections 1 and 2 above on amedium customarily used for software interchange.

    If distribution of object code is made by offering access to copyfrom a designated place, then offering equivalent access to copy thesource code from the same place satisfies the requirement todistribute the source code, even though third parties are notcompelled to copy the source along with the object code.

  6. A program that contains no derivative of any portion of theLibrary, but is designed to work with the Library by being compiled orlinked with it, is called a “work that uses the Library”. Such awork, in isolation, is not a derivative work of the Library, andtherefore falls outside the scope of this License.

    However, linking a “work that uses the Library” with the Librarycreates an executable that is a derivative of the Library (because itcontains portions of the Library), rather than a “work that uses thelibrary”. The executable is therefore covered by this License.Section 6 states terms for distribution of such executables.

    When a “work that uses the Library” uses material from a header filethat is part of the Library, the object code for the work may be aderivative work of the Library even though the source code is not.Whether this is true is especially significant if the work can belinked without the Library, or if the work is itself a library. Thethreshold for this to be true is not precisely defined by law.

    If such an object file uses only numerical parameters, datastructure layouts and accessors, and small macros and small inlinefunctions (ten lines or less in length), then the use of the objectfile is unrestricted, regardless of whether it is legally a derivativework. (Executables containing this object code plus portions of theLibrary will still fall under Section 6.)

    Otherwise, if the work is a derivative of the Library, you maydistribute the object code for the work under the terms of Section 6.Any executables containing that work also fall under Section 6,whether or not they are linked directly with the Library itself.

  7. As an exception to the Sections above, you may also combine orlink a “work that uses the Library” with the Library to produce awork containing portions of the Library, and distribute that workunder terms of your choice, provided that the terms permitmodification of the work for the customer’s own use and reverseengineering for debugging such modifications.

    You must give prominent notice with each copy of the work that theLibrary is used in it and that the Library and its use are covered bythis License. You must supply a copy of this License. If the workduring execution displays copyright notices, you must include thecopyright notice for the Library among them, as well as a referencedirecting the user to the copy of this License. Also, you must do oneof these things:

    1. Accompany the work with the complete correspondingmachine-readable source code for the Library including whateverchanges were used in the work (which must be distributed underSections 1 and 2 above); and, if the work is an executable linkedwith the Library, with the complete machine-readable “work thatuses the Library”, as object code and/or source code, so that theuser can modify the Library and then relink to produce a modifiedexecutable containing the modified Library. (It is understoodthat the user who changes the contents of definitions files in theLibrary will not necessarily be able to recompile the applicationto use the modified definitions.)
    2. Use a suitable shared library mechanism for linking with the Library. Asuitable mechanism is one that (1) uses at run time a copy of thelibrary already present on the user’s computer system, rather thancopying library functions into the executable, and (2) will operateproperly with a modified version of the library, if the user installsone, as long as the modified version is interface-compatible with theversion that the work was made with.
    3. Accompany the work with a written offer, valid for atleast three years, to give the same user the materialsspecified in Subsection 6a, above, for a charge no morethan the cost of performing this distribution.
    4. If distribution of the work is made by offering access to copyfrom a designated place, offer equivalent access to copy the abovespecified materials from the same place.
    5. Verify that the user has already received a copy of thesematerials or that you have already sent this user a copy.

    For an executable, the required form of the “work that uses theLibrary” must include any data and utility programs needed forreproducing the executable from it. However, as a special exception,the materials to be distributed need not include anything that isnormally distributed (in either source or binary form) with the majorcomponents (compiler, kernel, and so on) of the operating system onwhich the executable runs, unless that component itself accompanies theexecutable.

    It may happen that this requirement contradicts the licenserestrictions of other proprietary libraries that do not normallyaccompany the operating system. Such a contradiction means you cannotuse both them and the Library together in an executable that youdistribute.

  8. You may place library facilities that are a work based on theLibrary side-by-side in a single library together with other libraryfacilities not covered by this License, and distribute such a combinedlibrary, provided that the separate distribution of the work based onthe Library and of the other library facilities is otherwisepermitted, and provided that you do these two things:
    1. Accompany the combined library with a copy of the same workbased on the Library, uncombined with any other libraryfacilities. This must be distributed under the terms of theSections above.
    2. Give prominent notice with the combined library of the factthat part of it is a work based on the Library, and explainingwhere to find the accompanying uncombined form of the same work.
  9. You may not copy, modify, sublicense, link with, or distributethe Library except as expressly provided under this License. Anyattempt otherwise to copy, modify, sublicense, link with, ordistribute the Library is void, and will automatically terminate yourrights under this License. However, parties who have received copies,or rights, from you under this License will not have their licensesterminated so long as such parties remain in full compliance.
  10. You are not required to accept this License, since you have notsigned it. However, nothing else grants you permission to modify ordistribute the Library or its derivative works. These actions areprohibited by law if you do not accept this License. Therefore, bymodifying or distributing the Library (or any work based on theLibrary), you indicate your acceptance of this License to do so, andall its terms and conditions for copying, distributing or modifyingthe Library or works based on it.
  11. Each time you redistribute the Library (or any work based on theLibrary), the recipient automatically receives a license from theoriginal licensor to copy, distribute, link with or modify the Librarysubject to these terms and conditions. You may not impose any furtherrestrictions on the recipients’ exercise of the rights granted herein.You are not responsible for enforcing compliance by third parties withthis License.
  12. If, as a consequence of a court judgment or allegation of patentinfringement or for any other reason (not limited to patent issues),conditions are imposed on you (whether by court order, agreement orotherwise) that contradict the conditions of this License, they do notexcuse you from the conditions of this License. If you cannotdistribute so as to satisfy simultaneously your obligations under thisLicense and any other pertinent obligations, then as a consequence youmay not distribute the Library at all. For example, if a patentlicense would not permit royalty-free redistribution of the Library byall those who receive copies directly or indirectly through you, thenthe only way you could satisfy both it and this License would be torefrain entirely from distribution of the Library.

    If any portion of this section is held invalid or unenforceable under anyparticular circumstance, the balance of the section is intended to apply,and the section as a whole is intended to apply in other circumstances.

    It is not the purpose of this section to induce you to infringe anypatents or other property right claims or to contest validity of anysuch claims; this section has the sole purpose of protecting theintegrity of the free software distribution system which isimplemented by public license practices. Many people have madegenerous contributions to the wide range of software distributedthrough that system in reliance on consistent application of thatsystem; it is up to the author/donor to decide if he or she is willingto distribute software through any other system and a licensee cannotimpose that choice.

    This section is intended to make thoroughly clear what is believed tobe a consequence of the rest of this License.

  13. If the distribution and/or use of the Library is restricted incertain countries either by patents or by copyrighted interfaces, theoriginal copyright holder who places the Library under this License may addan explicit geographical distribution limitation excluding those countries,so that distribution is permitted only in or among countries not thusexcluded. In such case, this License incorporates the limitation as ifwritten in the body of this License.
  14. The Free Software Foundation may publish revised and/or newversions of the Lesser General Public License from time to time.Such new versions will be similar in spirit to the present version,but may differ in detail to address new problems or concerns.

    Each version is given a distinguishing version number. If the Libraryspecifies a version number of this License which applies to it and“any later version”, you have the option of following the terms andconditions either of that version or of any later version published bythe Free Software Foundation. If the Library does not specify alicense version number, you may choose any version ever published bythe Free Software Foundation.

  15. If you wish to incorporate parts of the Library into other freeprograms whose distribution conditions are incompatible with these,write to the author to ask for permission. For software which iscopyrighted by the Free Software Foundation, write to the FreeSoftware Foundation; we sometimes make exceptions for this. Ourdecision will be guided by the two goals of preserving the free statusof all derivatives of our free software and of promoting the sharingand reuse of software generally.
    NO WARRANTY
  16. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NOWARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OROTHER PARTIES PROVIDE THE LIBRARY “AS IS” WITHOUT WARRANTY OF ANYKIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THEIMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULARPURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THELIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUMETHE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
  17. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO INWRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFYAND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOUFOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL ORCONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THELIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEINGRENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR AFAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IFSUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCHDAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Libraries

If you develop a new library, and you want it to be of the greatestpossible use to the public, we recommend making it free software thateveryone can redistribute and change. You can do so by permittingredistribution under these terms (or, alternatively, under the terms of theordinary General Public License).

To apply these terms, attach the following notices to the library. It issafest to attach them to the start of each source file to most effectivelyconvey the exclusion of warranty; and each file should have at least the“copyright” line and a pointer to where the full notice is found.

one line to give the library's name and an idea of what it does.
Copyright (C) year  name of author

This library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at
your option) any later version.

This library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
USA.

Also add information on how to contact you by electronic and paper mail.

You should also get your employer (if you work as a programmer) or yourschool, if any, to sign a “copyright disclaimer” for the library, ifnecessary. Here is a sample; alter the names:

Yoyodyne, Inc., hereby disclaims all copyright interest in the library
`Frob' (a library for tweaking knobs) written by James Random Hacker.

signature of Ty Coon, 1 April 1990
Ty Coon, President of Vice

That’s all there is to it!


Previous: GNU LGPL, Up: Licenses   [Contents][Index]

C.3 GNU Free Documentation License

Version 1.2, November 2002
Copyright  2000,2001,2002 Free Software Foundation, Inc.
51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA

Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
  1. PREAMBLE

    The purpose of this License is to make a manual, textbook, or otherfunctional and useful document free in the sense of freedom: toassure everyone the effective freedom to copy and redistribute it,with or without modifying it, either commercially or noncommercially.Secondarily, this License preserves for the author and publisher a wayto get credit for their work, while not being considered responsiblefor modifications made by others.

    This License is a kind of “copyleft”, which means that derivativeworks of the document must themselves be free in the same sense. Itcomplements the GNU General Public License, which is a copyleftlicense designed for free software.

    We have designed this License in order to use it for manuals for freesoftware, because free software needs free documentation: a freeprogram should come with manuals providing the same freedoms that thesoftware does. But this License is not limited to software manuals;it can be used for any textual work, regardless of subject matter orwhether it is published as a printed book. We recommend this Licenseprincipally for works whose purpose is instruction or reference.

  2. APPLICABILITY AND DEFINITIONS

    This License applies to any manual or other work, in any medium, thatcontains a notice placed by the copyright holder saying it can bedistributed under the terms of this License. Such a notice grants aworld-wide, royalty-free license, unlimited in duration, to use thatwork under the conditions stated herein. The “Document”, below,refers to any such manual or work. Any member of the public is alicensee, and is addressed as “you”. You accept the license if youcopy, modify or distribute the work in a way requiring permissionunder copyright law.

    A “Modified Version” of the Document means any work containing theDocument or a portion of it, either copied verbatim, or withmodifications and/or translated into another language.

    A “Secondary Section” is a named appendix or a front-matter sectionof the Document that deals exclusively with the relationship of thepublishers or authors of the Document to the Document’s overallsubject (or to related matters) and contains nothing that could falldirectly within that overall subject. (Thus, if the Document is inpart a textbook of mathematics, a Secondary Section may not explainany mathematics.) The relationship could be a matter of historicalconnection with the subject or with related matters, or of legal,commercial, philosophical, ethical or political position regardingthem.

    The “Invariant Sections” are certain Secondary Sections whose titlesare designated, as being those of Invariant Sections, in the noticethat says that the Document is released under this License. If asection does not fit the above definition of Secondary then it is notallowed to be designated as Invariant. The Document may contain zeroInvariant Sections. If the Document does not identify any InvariantSections then there are none.

    The “Cover Texts” are certain short passages of text that are listed,as Front-Cover Texts or Back-Cover Texts, in the notice that says thatthe Document is released under this License. A Front-Cover Text maybe at most 5 words, and a Back-Cover Text may be at most 25 words.

    A “Transparent” copy of the Document means a machine-readable copy,represented in a format whose specification is available to thegeneral public, that is suitable for revising the documentstraightforwardly with generic text editors or (for images composed ofpixels) generic paint programs or (for drawings) some widely availabledrawing editor, and that is suitable for input to text formatters orfor automatic translation to a variety of formats suitable for inputto text formatters. A copy made in an otherwise Transparent fileformat whose markup, or absence of markup, has been arranged to thwartor discourage subsequent modification by readers is not Transparent.An image format is not Transparent if used for any substantial amountof text. A copy that is not “Transparent” is called “Opaque”.

    Examples of suitable formats for Transparent copies include plainASCII without markup, Texinfo input format, LaTeX inputformat, SGML or XML using a publicly availableDTD, and standard-conforming simple HTML,PostScript or PDF designed for human modification. Examplesof transparent image formats include PNG, XCF andJPG. Opaque formats include proprietary formats that can beread and edited only by proprietary word processors, SGML orXML for which the DTD and/or processing tools arenot generally available, and the machine-generated HTML,PostScript or PDF produced by some word processors foroutput purposes only.

    The “Title Page” means, for a printed book, the title page itself,plus such following pages as are needed to hold, legibly, the materialthis License requires to appear in the title page. For works informats which do not have any title page as such, “Title Page” meansthe text near the most prominent appearance of the work’s title,preceding the beginning of the body of the text.

    A section “Entitled XYZ” means a named subunit of the Document whosetitle either is precisely XYZ or contains XYZ in parentheses followingtext that translates XYZ in another language. (Here XYZ stands for aspecific section name mentioned below, such as “Acknowledgements”,“Dedications”, “Endorsements”, or “History”.) To “Preserve the Title”of such a section when you modify the Document means that it remains asection “Entitled XYZ” according to this definition.

    The Document may include Warranty Disclaimers next to the notice whichstates that this License applies to the Document. These WarrantyDisclaimers are considered to be included by reference in thisLicense, but only as regards disclaiming warranties: any otherimplication that these Warranty Disclaimers may have is void and hasno effect on the meaning of this License.

  3. VERBATIM COPYING

    You may copy and distribute the Document in any medium, eithercommercially or noncommercially, provided that this License, thecopyright notices, and the license notice saying this License appliesto the Document are reproduced in all copies, and that you add no otherconditions whatsoever to those of this License. You may not usetechnical measures to obstruct or control the reading or furthercopying of the copies you make or distribute. However, you may acceptcompensation in exchange for copies. If you distribute a large enoughnumber of copies you must also follow the conditions in section 3.

    You may also lend copies, under the same conditions stated above, andyou may publicly display copies.

  4. COPYING IN QUANTITY

    If you publish printed copies (or copies in media that commonly haveprinted covers) of the Document, numbering more than 100, and theDocument’s license notice requires Cover Texts, you must enclose thecopies in covers that carry, clearly and legibly, all these CoverTexts: Front-Cover Texts on the front cover, and Back-Cover Texts onthe back cover. Both covers must also clearly and legibly identifyyou as the publisher of these copies. The front cover must presentthe full title with all words of the title equally prominent andvisible. You may add other material on the covers in addition.Copying with changes limited to the covers, as long as they preservethe title of the Document and satisfy these conditions, can be treatedas verbatim copying in other respects.

    If the required texts for either cover are too voluminous to fitlegibly, you should put the first ones listed (as many as fitreasonably) on the actual cover, and continue the rest onto adjacentpages.

    If you publish or distribute Opaque copies of the Document numberingmore than 100, you must either include a machine-readable Transparentcopy along with each Opaque copy, or state in or with each Opaque copya computer-network location from which the general network-usingpublic has access to download using public-standard network protocolsa complete Transparent copy of the Document, free of added material.If you use the latter option, you must take reasonably prudent steps,when you begin distribution of Opaque copies in quantity, to ensurethat this Transparent copy will remain thus accessible at the statedlocation until at least one year after the last time you distribute anOpaque copy (directly or through your agents or retailers) of thatedition to the public.

    It is requested, but not required, that you contact the authors of theDocument well before redistributing any large number of copies, to givethem a chance to provide you with an updated version of the Document.

  5. MODIFICATIONS

    You may copy and distribute a Modified Version of the Document underthe conditions of sections 2 and 3 above, provided that you releasethe Modified Version under precisely this License, with the ModifiedVersion filling the role of the Document, thus licensing distributionand modification of the Modified Version to whoever possesses a copyof it. In addition, you must do these things in the Modified Version:

    1. Use in the Title Page (and on the covers, if any) a title distinctfrom that of the Document, and from those of previous versions(which should, if there were any, be listed in the History sectionof the Document). You may use the same title as a previous versionif the original publisher of that version gives permission.
    2. List on the Title Page, as authors, one or more persons or entitiesresponsible for authorship of the modifications in the ModifiedVersion, together with at least five of the principal authors of theDocument (all of its principal authors, if it has fewer than five),unless they release you from this requirement.
    3. State on the Title page the name of the publisher of theModified Version, as the publisher.
    4. Preserve all the copyright notices of the Document.
    5. Add an appropriate copyright notice for your modificationsadjacent to the other copyright notices.
    6. Include, immediately after the copyright notices, a license noticegiving the public permission to use the Modified Version under theterms of this License, in the form shown in the Addendum below.
    7. Preserve in that license notice the full lists of Invariant Sectionsand required Cover Texts given in the Document’s license notice.
    8. Include an unaltered copy of this License.
    9. Preserve the section Entitled “History”, Preserve its Title, and addto it an item stating at least the title, year, new authors, andpublisher of the Modified Version as given on the Title Page. Ifthere is no section Entitled “History” in the Document, create onestating the title, year, authors, and publisher of the Document asgiven on its Title Page, then add an item describing the ModifiedVersion as stated in the previous sentence.
    10. Preserve the network location, if any, given in the Document forpublic access to a Transparent copy of the Document, and likewisethe network locations given in the Document for previous versionsit was based on. These may be placed in the “History” section.You may omit a network location for a work that was published atleast four years before the Document itself, or if the originalpublisher of the version it refers to gives permission.
    11. For any section Entitled “Acknowledgements” or “Dedications”, Preservethe Title of the section, and preserve in the section all thesubstance and tone of each of the contributor acknowledgements and/ordedications given therein.
    12. Preserve all the Invariant Sections of the Document,unaltered in their text and in their titles. Section numbersor the equivalent are not considered part of the section titles.
    13. Delete any section Entitled “Endorsements”. Such a sectionmay not be included in the Modified Version.
    14. Do not retitle any existing section to be Entitled “Endorsements” orto conflict in title with any Invariant Section.
    15. Preserve any Warranty Disclaimers.

    If the Modified Version includes new front-matter sections orappendices that qualify as Secondary Sections and contain no materialcopied from the Document, you may at your option designate some or allof these sections as invariant. To do this, add their titles to thelist of Invariant Sections in the Modified Version’s license notice.These titles must be distinct from any other section titles.

    You may add a section Entitled “Endorsements”, provided it containsnothing but endorsements of your Modified Version by variousparties—for example, statements of peer review or that the text hasbeen approved by an organization as the authoritative definition of astandard.

    You may add a passage of up to five words as a Front-Cover Text, and apassage of up to 25 words as a Back-Cover Text, to the end of the listof Cover Texts in the Modified Version. Only one passage ofFront-Cover Text and one of Back-Cover Text may be added by (orthrough arrangements made by) any one entity. If the Document alreadyincludes a cover text for the same cover, previously added by you orby arrangement made by the same entity you are acting on behalf of,you may not add another; but you may replace the old one, on explicitpermission from the previous publisher that added the old one.

    The author(s) and publisher(s) of the Document do not by this Licensegive permission to use their names for publicity for or to assert orimply endorsement of any Modified Version.

  6. COMBINING DOCUMENTS

    You may combine the Document with other documents released under thisLicense, under the terms defined in section 4 above for modifiedversions, provided that you include in the combination all of theInvariant Sections of all of the original documents, unmodified, andlist them all as Invariant Sections of your combined work in itslicense notice, and that you preserve all their Warranty Disclaimers.

    The combined work need only contain one copy of this License, andmultiple identical Invariant Sections may be replaced with a singlecopy. If there are multiple Invariant Sections with the same name butdifferent contents, make the title of each such section unique byadding at the end of it, in parentheses, the name of the originalauthor or publisher of that section if known, or else a unique number.Make the same adjustment to the section titles in the list ofInvariant Sections in the license notice of the combined work.

    In the combination, you must combine any sections Entitled “History”in the various original documents, forming one section Entitled“History”; likewise combine any sections Entitled “Acknowledgements”,and any sections Entitled “Dedications”. You must delete allsections Entitled “Endorsements.”

  7. COLLECTIONS OF DOCUMENTS

    You may make a collection consisting of the Document and other documentsreleased under this License, and replace the individual copies of thisLicense in the various documents with a single copy that is included inthe collection, provided that you follow the rules of this License forverbatim copying of each of the documents in all other respects.

    You may extract a single document from such a collection, and distributeit individually under this License, provided you insert a copy of thisLicense into the extracted document, and follow this License in allother respects regarding verbatim copying of that document.

  8. AGGREGATION WITH INDEPENDENT WORKS

    A compilation of the Document or its derivatives with other separateand independent documents or works, in or on a volume of a storage ordistribution medium, is called an “aggregate” if the copyrightresulting from the compilation is not used to limit the legal rightsof the compilation’s users beyond what the individual works permit.When the Document is included in an aggregate, this License does notapply to the other works in the aggregate which are not themselvesderivative works of the Document.

    If the Cover Text requirement of section 3 is applicable to thesecopies of the Document, then if the Document is less than one half ofthe entire aggregate, the Document’s Cover Texts may be placed oncovers that bracket the Document within the aggregate, or theelectronic equivalent of covers if the Document is in electronic form.Otherwise they must appear on printed covers that bracket the wholeaggregate.

  9. TRANSLATION

    Translation is considered a kind of modification, so you maydistribute translations of the Document under the terms of section 4.Replacing Invariant Sections with translations requires specialpermission from their copyright holders, but you may includetranslations of some or all Invariant Sections in addition to theoriginal versions of these Invariant Sections. You may include atranslation of this License, and all the license notices in theDocument, and any Warranty Disclaimers, provided that you also includethe original English version of this License and the original versionsof those notices and disclaimers. In case of a disagreement betweenthe translation and the original version of this License or a noticeor disclaimer, the original version will prevail.

    If a section in the Document is Entitled “Acknowledgements”,“Dedications”, or “History”, the requirement (section 4) to Preserveits Title (section 1) will typically require changing the actualtitle.

  10. TERMINATION

    You may not copy, modify, sublicense, or distribute the Document exceptas expressly provided for under this License. Any other attempt tocopy, modify, sublicense or distribute the Document is void, and willautomatically terminate your rights under this License. However,parties who have received copies, or rights, from you under thisLicense will not have their licenses terminated so long as suchparties remain in full compliance.

  11. FUTURE REVISIONS OF THIS LICENSE

    The Free Software Foundation may publish new, revised versionsof the GNU Free Documentation License from time to time. Such newversions will be similar in spirit to the present version, but maydiffer in detail to address new problems or concerns. Seehttp://www.gnu.org/copyleft/.

    Each version of the License is given a distinguishing version number.If the Document specifies that a particular numbered version of thisLicense “or any later version” applies to it, you have the option offollowing the terms and conditions either of that specified version orof any later version that has been published (not as a draft) by theFree Software Foundation. If the Document does not specify a versionnumber of this License, you may choose any version ever published (notas a draft) by the Free Software Foundation.

ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy ofthe License in the document and put the following copyright andlicense notices just after the title page:

  Copyright (C)  year  your name.
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.2
  or any later version published by the Free Software Foundation;
  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
  Texts.  A copy of the license is included in the section entitled ``GNU
  Free Documentation License''.

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,replace the “with…Texts.” line with this:

    with the Invariant Sections being list their titles, with
    the Front-Cover Texts being list, and with the Back-Cover Texts
    being list.

If you have Invariant Sections without Cover Texts, or some othercombination of the three, merge those two alternatives to suit thesituation.

If your document contains nontrivial examples of program code, werecommend releasing these examples in parallel under your choice offree software license, such as the GNU General Public License,to permit their use in free software.


转载自:https://www.gnu.org/software/gettext/manual/gettext.html#Libraries

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/qk1992919/article/details/78059152

智能推荐

c# 调用c++ lib静态库_c#调用lib-程序员宅基地

文章浏览阅读2w次,点赞7次,收藏51次。四个步骤1.创建C++ Win32项目动态库dll 2.在Win32项目动态库中添加 外部依赖项 lib头文件和lib库3.导出C接口4.c#调用c++动态库开始你的表演...①创建一个空白的解决方案,在解决方案中添加 Visual C++ , Win32 项目空白解决方案的创建:添加Visual C++ , Win32 项目这......_c#调用lib

deepin/ubuntu安装苹方字体-程序员宅基地

文章浏览阅读4.6k次。苹方字体是苹果系统上的黑体,挺好看的。注重颜值的网站都会使用,例如知乎:font-family: -apple-system, BlinkMacSystemFont, Helvetica Neue, PingFang SC, Microsoft YaHei, Source Han Sans SC, Noto Sans CJK SC, W..._ubuntu pingfang

html表单常见操作汇总_html表单的处理程序有那些-程序员宅基地

文章浏览阅读159次。表单表单概述表单标签表单域按钮控件demo表单标签表单标签基本语法结构<form action="处理数据程序的url地址“ method=”get|post“ name="表单名称”></form><!--action,当提交表单时,向何处发送表单中的数据,地址可以是相对地址也可以是绝对地址--><!--method将表单中的数据传送给服务器处理,get方式直接显示在url地址中,数据可以被缓存,且长度有限制;而post方式数据隐藏传输,_html表单的处理程序有那些

PHP设置谷歌验证器(Google Authenticator)实现操作二步验证_php otp 验证器-程序员宅基地

文章浏览阅读1.2k次。使用说明:开启Google的登陆二步验证(即Google Authenticator服务)后用户登陆时需要输入额外由手机客户端生成的一次性密码。实现Google Authenticator功能需要服务器端和客户端的支持。服务器端负责密钥的生成、验证一次性密码是否正确。客户端记录密钥后生成一次性密码。下载谷歌验证类库文件放到项目合适位置(我这边放在项目Vender下面)https://github.com/PHPGangsta/GoogleAuthenticatorPHP代码示例://引入谷_php otp 验证器

【Python】matplotlib.plot画图横坐标混乱及间隔处理_matplotlib更改横轴间距-程序员宅基地

文章浏览阅读4.3k次,点赞5次,收藏11次。matplotlib.plot画图横坐标混乱及间隔处理_matplotlib更改横轴间距

docker — 容器存储_docker 保存容器-程序员宅基地

文章浏览阅读2.2k次。①Storage driver 处理各镜像层及容器层的处理细节,实现了多层数据的堆叠,为用户 提供了多层数据合并后的统一视图②所有 Storage driver 都使用可堆叠图像层和写时复制(CoW)策略③docker info 命令可查看当系统上的 storage driver主要用于测试目的,不建议用于生成环境。_docker 保存容器

随便推点

网络拓扑结构_网络拓扑csdn-程序员宅基地

文章浏览阅读834次,点赞27次,收藏13次。网络拓扑结构是指计算机网络中各组件(如计算机、服务器、打印机、路由器、交换机等设备)及其连接线路在物理布局或逻辑构型上的排列形式。这种布局不仅描述了设备间的实际物理连接方式,也决定了数据在网络中流动的路径和方式。不同的网络拓扑结构影响着网络的性能、可靠性、可扩展性及管理维护的难易程度。_网络拓扑csdn

JS重写Date函数,兼容IOS系统_date.prototype 将所有 ios-程序员宅基地

文章浏览阅读1.8k次,点赞5次,收藏8次。IOS系统Date的坑要创建一个指定时间的new Date对象时,通常的做法是:new Date("2020-09-21 11:11:00")这行代码在 PC 端和安卓端都是正常的,而在 iOS 端则会提示 Invalid Date 无效日期。在IOS年月日中间的横岗许换成斜杠,也就是new Date("2020/09/21 11:11:00")通常为了兼容IOS的这个坑,需要做一些额外的特殊处理,笔者在开发的时候经常会忘了兼容IOS系统。所以就想试着重写Date函数,一劳永逸,避免每次ne_date.prototype 将所有 ios

如何将EXCEL表导入plsql数据库中-程序员宅基地

文章浏览阅读5.3k次。方法一:用PLSQL Developer工具。 1 在PLSQL Developer的sql window里输入select * from test for update; 2 按F8执行 3 打开锁, 再按一下加号. 鼠标点到第一列的列头,使全列成选中状态,然后粘贴,最后commit提交即可。(前提..._excel导入pl/sql

Git常用命令速查手册-程序员宅基地

文章浏览阅读83次。Git常用命令速查手册1、初始化仓库git init2、将文件添加到仓库git add 文件名 # 将工作区的某个文件添加到暂存区 git add -u # 添加所有被tracked文件中被修改或删除的文件信息到暂存区,不处理untracked的文件git add -A # 添加所有被tracked文件中被修改或删除的文件信息到暂存区,包括untracked的文件...

分享119个ASP.NET源码总有一个是你想要的_千博二手车源码v2023 build 1120-程序员宅基地

文章浏览阅读202次。分享119个ASP.NET源码总有一个是你想要的_千博二手车源码v2023 build 1120

【C++缺省函数】 空类默认产生的6个类成员函数_空类默认产生哪些类成员函数-程序员宅基地

文章浏览阅读1.8k次。版权声明:转载请注明出处 http://blog.csdn.net/irean_lau。目录(?)[+]1、缺省构造函数。2、缺省拷贝构造函数。3、 缺省析构函数。4、缺省赋值运算符。5、缺省取址运算符。6、 缺省取址运算符 const。[cpp] view plain copy_空类默认产生哪些类成员函数

推荐文章

热门文章

相关标签