Guide to Translating CSL Locale Files¶
Table of Contents
Preface¶
This document describes how you can help us improve the language support of Citation Style Language (CSL) styles by translating the CSL locale file of your favorite language.
CSL styles are either bound to one particular locale (e.g. the “British Psychological Society” CSL style will always produce citations and bibliographies in British English), or they (automatically) localize, e.g. to a user-selected locale, or to the locale of the user’s operating system.
All CSL styles, both those with and without a fixed locale, rely on locale files for default localization data, which consists of translated terms commonly used in citations and bibliographies, date formats, and grammar rules. Storing localization data in separate files has several benefits: translations are easier to maintain, styles are more compact (although styles can still include their own localization data to override the defaults), and styles can be (mostly) language-agnostic.
Below we will describe the structure of a locale file, give instructions on how to translate all its parts, and explain how you can submit your translations. See the CSL specification for more in-depth documentation on the structure and function of locale files.
Getting Started¶
The CSL locale files are kept in a GitHub repository at https://github.com/citation-style-language/locales/.
Each locale file contains the localization data for one language. Locale files are named “locales-xx-XX.xml”, where “xx-XX” is a BCP 47 language code (e.g. the locale code for British English is “en-GB”). The repository wiki lists the locale code, language, and translation status of all locale files in the repository.
If you find that a locale file already exists for your language, but that its translations are inaccurate or incomplete, you can start translating that file. If there is no locale file for your language, copy the “locales-en-US.xml” file and start from there. Don’t worry about finding the correct BCP 47 locale code for a new language; we’ll be happy to look it up when you submit your new locale file.
Modifications to existing locale files can be made directly via the GitHub website. New locale files can be submitted as a “gist” or through a pull request. For details, see the instructions on submitting CSL styles. If you edit a locale file on your own computer, use a suitable plain text editor such as TextWrangler for OS X, Notepad++ for Windows, or jEdit.
Translating Locale Files¶
When translating locale files, leave the overall structure of the locale file untouched. It makes life easier for us if you don’t remove existing elements, introduce new ones, or move stuff around (exceptions are discussed below).
xml:lang¶
At the top of the locale file you’ll find the locale
root element. The value
of its xml:lang
attribute should be set to the same language code used in
the file name of the locale file. For the locales-en-US.xml locale file, this is
“en-US”:
<locale xmlns="http://purl.org/net/xbiblio/csl" version="1.0" xml:lang="en-US">
Locale File Metadata¶
Directly below the locale
root you’ll find an info
element. Here you can
list yourself as a translator using the translator
element. Optionally you
can include contact information such as a website or email address. A locale
file can list multiple translators. The rights
element indicates under which
license the locale file is released. All locale files in our repository use the
same Creative Commons license, so you don’t have to change anything here. The
updated
element is used to keep track of when the locale file was last
updated. Feel free to ignore it if the format looks too intimidating.
<info>
<translator>
<name>John Doe</name>
<email>john.doe@citationstyles.org</email>
<uri>http://citationstyles.org/</uri>
</translator>
<rights license="http://creativecommons.org/licenses/by-sa/3.0/">This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License</rights>
<updated>2012-07-04T23:31:02+00:00</updated>
</info>
Grammar Rules¶
Next up is the style-options
element:
<style-options punctuation-in-quote="true"/>
This element is used to define localized grammar rules, as described in the Locale Options section in the CSL specification.
Date Formats¶
CSL styles can render dates in either non-localizing or localizing formats:
<style>
<!-- use of non-localized date format -->
<macro name="accessed">
<date variable="accessed" suffix=", ">
<date-part name="month" suffix=" "/>
<date-part name="day" suffix=", "/>
<date-part name="year"/>
</date>
</macro>
<!-- use of localized date format -->
<macro name="issued">
<date variable="issued" form="text"/>
</macro>
</style>
Each locale file defines two localized date formats: a numeric format (e.g. “2012/9/3”), and a textual format, where the month is written out in full (e.g. “September 3, 2012”).
To localize a date format, place the date-part elements for “day”, “month”, and
“year” in the desired order. Use the prefix
and suffix
attributes (on
the date-part
elements), or the delimiter
attribute (on the date
element) to define punctuation before, after, and between the different
date-parts. When using affixes, make sure that dates that only consist of a year
and a month, or of only a year, still render correctly. For example, the US
English localized “text” date format,
<date form="text">
<date-part name="month" suffix=" "/>
<date-part name="day" suffix=", "/>
<date-part name="year"/>
</date>
will produce dates like “September 3, 2012”, “September 2012”, and “2012”. Compare this to
<date form="text">
<date-part name="month"/>
<date-part name="day" prefix=" "/>
<date-part name="year" prefix=", "/>
</date>
which gives the same correct complete date (“September 3, 2012”), but which produces incorrect output for dates that don’t have a day, or don’t have a day and month (“September, 2012” and “, 2012”, respectively).
To read more about customizing date formats, see the Localized Date Formats and Date-part sections in the CSL specification.
Terms¶
The terms
element makes up the last section of the locale file, and contains
all the term translations. Below we discuss the different types of terms, and
how to translate them.
In its simplest form, a term consists only of a term
element with the
name
attribute indicating the term name, and with the translation enclosed
between the start and end tag:
<terms>
<term name="et-al">et al.</term>
</terms>
See the Terms section in the CSL specification.
Abbreviations¶
When translating abbreviations such as “et al.”, always include periods where applicable.
Plurals¶
Many terms have translations for both the singular and plural form. In this
case, the term
element contains a single
(for singular) and a
multiple
(for plural) element, which enclose the translations:
<terms>
<term name="edition">
<single>edition</single>
<multiple>editions</multiple>
</term>
</terms>
Forms¶
Terms can also vary in their ‘form’, which is indicated with the “form”
attribute on the term
element. The different forms are “long” (the default),
“short” (abbreviated form of “long”), “verb”, “verb-short” (abbreviated form of
“verb”), and “symbol”. Examples of the different forms:
<terms>
<term name="editor">
<single>editor</single>
<multiple>editors</multiple>
</term>
<term name="editor" form="short">
<single>ed.</single>
<multiple>eds.</multiple>
</term>
<term name="editor" form="verb">edited by</term>
<term name="editor" form="verb-short">ed.</term>
<term name="paragraph">
<single>paragraph</single>
<multiple>paragraph</multiple>
</term>
<term name="paragraph" form="symbol">
<single>¶</single>
<multiple>¶¶</multiple>
</term>
</terms>
AD/BC¶
The “ad” and “bc” terms are used to format years before 1000. E.g. the year “79” becomes “79AD”, and “-2500” becomes “2500BC”.
See the AD and BC section in the CSL specification.
Punctuation¶
The terms “open-quote”, “close-quote”, “open-inner-quote”, “close-inner-quote”, and “page-range-delimiter” define punctuation.
When a CSL style renders a title in quotes through the use of the quotes
attribute, it uses the “open-quote” and “close-quote” terms. When the title
contains internal quotes, these are replaced by “open-inner-quote”,
“close-inner-quote”. For example, with
<terms>
<term name="open-quote">“</term>
<term name="close-quote">”</term>
<term name="open-inner-quote">‘</term>
<term name="close-inner-quote">’</term>
<term name="page-range-delimiter">–</term>
</terms>
styles can render titles as
“Moby-Dick”
“Textual Analysis of ‘Moby-Dick’”
The “page-range-delimiter” terms is used to connect the first and last page of page ranges, e.g. “15–18” (it’s default value is an en-dash).
See the Quotes and Page Ranges sections in the CSL specification.
Ordinals¶
CSL styles can render numbers (e.g., “2”) in two ordinal forms: “long-ordinal” (“second”) and “ordinal” (“2nd”). Both forms are localized through the use of terms.
The “long-ordinal” form is limited to the numbers 1 through 10 (the fallback for other numbers is the “ordinal” form). Each of these ten numbers has its own term (“long-ordinal-01” through “long-ordinal-10”).
Things are different for the “ordinal” form. Here, terms are only used to define the ordinal suffix (“nd” for “2nd”). Furthermore, terms and numbers don’t correspond one-to-one. For example, the “ordinal” term defines the default suffix, which is used for all numbers (unless, as described below, exceptions are introduced through the use of the terms “ordinal-00” through “ordinal-99”).
CSL also supports gender-specific ordinals (both for “long-ordinal” and “ordinal” forms). In languages such as French, ordinal numbers must match the gender of the target noun, which can be feminine or masculine. E.g. “1re édition” (“édition” is feminine) and “1er janvier” (“janvier” is masculine). See the relevant section below.
Terms for “ordinal” numbers¶
Terms for the “ordinal” form follow special rules to make it possible to render any number in the “ordinal” form (e.g., “2nd”, “15th”, “231st”), without having to define a term for each number.
The logic for defining ordinal suffixes with terms is described at Ordinal Suffixes, and won’t be revisited here. Instead, we’ll look at an example.
In English, there are four different ordinal suffixes in use: “st”, “nd”, and “rd” are used for numbers ending on 1, 2, and 3, respectively, while “th” is used for numbers ending on 0 and 4 through 9. Exceptions are numbers ending on “11”, “12”, and “13”, which also use “th”.
To capture this logic, we start by defining the “ordinal” term as “th”, which is
the most common suffix. Then, we define the terms “ordinal-01”, “ordinal-02”,
and “ordinal-01” as “st”, “nd”, and “rd”, respectively. By default (i.e., when
the term
elements don’t carry a match
attribute), the terms “ordinal-00”
through “ordinal-09” repeat at intervals of 10. For example, the term
“ordinal-01” overrides the “ordinal” term for numbers 1, 11, 21, 31, etc. At
this point, we would get “ordinal” numbers such as “1st”, “2nd”, “3rd”, “4th”,
“21st”, “67th”, and “101st”, but we would also get the incorrect “11st”, “12nd”
and “13rd”. For these cases, we define the terms “ordinal-11”, “ordinal-12”, and
“”ordinal-13” as “th”. By default, the terms “ordinal-10” though “ordinal-99”
repeat at intervals of 100. For example, the term “ordinal-11” overrides the
“ordinal” and “ordinal-01” terms for numbers “11”, “111”, “211”, etc. So, in
total, we need the following seven terms:
<terms>
<term name="ordinal">th</term>
<term name="ordinal-01">st</term>
<term name="ordinal-02">nd</term>
<term name="ordinal-03">rd</term>
<term name="ordinal-11">th</term>
<term name="ordinal-12">th</term>
<term name="ordinal-13">th</term>
</terms>
Fortunately, many languages have simpler “ordinal” numbers. E.g., for German all “ordinal” numbers receive a period as the suffix, so it suffice to define the “ordinal” term:
<terms>
<term name="ordinal">.</term>
</terms>
Gender-specific Ordinals¶
To use gender-specific ordinals, we first need to define the gender of several
target nouns: the terms accompanying the number variables (it is probably
sufficient to specify the gender for “edition”, “issue”, and “volume”) and the
month terms (“month-01” through “month-12”, corresponding to January through
December). This is done by setting the gender
attribute on the “long”
(default) form of these terms to either “masculine” or “feminine”.
Secondly, we need to define “feminine” and “masculine” variants of the ordinal
terms, which is done with the gender-variant
attribute (set to “masculine”
or “feminine”).
A minimal example for French:
<?xml version="1.0" encoding="UTF-8"?>
<locale xml:lang="fr-FR">
<terms>
<term name="edition" gender="feminine">
<single>édition</single>
<multiple>éditions</multiple>
</term>
<term name="edition" form="short">éd.</term>
<term name="month-01" gender="masculine">janvier</term>
<term name="long-ordinal-01" gender-form="masculine">premier</term>
<term name="long-ordinal-01" gender-form="feminine">première</term>
<term name="long-ordinal-01">premier</term>
<term name="long-ordinal-02">deuxième</term>
<term name="long-ordinal-03">troisième</term>
<term name="long-ordinal-04">quatrième</term>
<term name="long-ordinal-05">cinquième</term>
<term name="long-ordinal-06">sixième</term>
<term name="long-ordinal-07">septième</term>
<term name="long-ordinal-08">huitième</term>
<term name="long-ordinal-09">neuvième</term>
<term name="long-ordinal-10">dixième</term>
<term name="ordinal">e</term>
<term name="ordinal-01" gender-form="feminine" match="whole-number">re</term>
<term name="ordinal-01" gender-form="masculine" match="whole-number">er</term>
</terms>
</locale>
In this example, the “edition” term is defined as feminine, and the “month-01”
term (“janvier”) is defined as masculine. For French, of the “long-ordinal”
terms, only “long-ordinal-01” has gender-variants (“premier” for masculine,
“première” for feminine). To cover cases where no gender is defined for the
target noun (e.g., a style author might redefine a term like “edition” but
forget to specify the gender), also a neuter variant of “long-ordinal-01” is
defined without the gender
attribute. The “ordinal” term defines the default
suffix. The only exceptions are for the number 1 when the target noun is either
feminine or masculine (with match
set to “whole-number”, the term does not
repeat), e.g. “1re édition” but “11e édition”.
For more information, see the Gender-specific Ordinals section in the CSL specification.
Submitting Contributions¶
To submit changes to an existing locale file, or to submit a new locale file, follow the submission instructions for CSL styles.
Questions?¶
Questions? Contact us on Twitter at @csl_styles, or create an issue on GitHub here.