[Distutils] Non English Speaking Users of PyPI - I need Help!

Donald Stufft donald at stufft.io
Tue Jan 26 14:10:28 EST 2016

> On Jan 26, 2016, at 1:20 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Tue, 26 Jan 2016 12:16:16 -0500
> Donald Stufft <donald at stufft.io> wrote:
>> As many of you are aware there has been an effort to replace the current PyPI with a new, improved PyPI. This project has been codenamed Warehouse and has been progressing nicely. However we’ve run into a bit of an issue when deciding what to support that we’re not feeling super qualified to make an informed decision on.
>> The new PyPI is going to support translated content (for the UI elements, not for what people upload to there), although we will not launch with any translations actually added besides English. Currently the translation engine we’re using (l20n.js) does not support anything but “Evergreen” browsers (browsers that constantly and automatically update) which means we don’t have support for older versions of IE. My question to anyone who is, or is familiar with places where English isn’t the native language, how big of a deal is this if we only support newer browsers for translations?
>> If you can weigh in on the issue for this (https://github.com/pypa/warehouse/issues/881) that would be great! If you know someone who might have a good insight, please pass this along to them as well.
> Not answering your question, but needing Javascript on the
> client to support L10n sounds like a weird decision (although Mozilla
> seems to be pushing this... how surprising).  Every bit of client-side
> Javascript tends to make Web pages slower and it tends to accumulate
> into the current bloated mess that is the modern Web.  For static text
> this really doesn't sound warranted.
> (not to mention that mutating the body text after the HTML has loaded
> may also produce a poor user experience, depending on various
> conditions. And the native English speakers who develop the software
> on top-grade machines will probably not notice it, thinking everything
> is fine.)
> As for your question, though, I would expect some of the less proficient
> English speakers to also have outdated hardware or software installs,
> especially in poor countries or very humble social environments.

So the reason for wanting to use L20n (and forgive me, English is the only
language I speak so a lot of this is based off of my, possibly wrong,
understanding of how other languages work) is because it is a lot more powerful
than the traditional gettext based solutions.

One such problem I believe is the lack of variants in the older tools like
gettext. I think at best you can get singular/plural but I think that other
languages have a whole host of different things that they need to vary their
grammar based on. An example from the L20n website is:

    <brandShortName {
        *nominative: "Aurora",
        genitive: "Aurore",
        dative: "Aurori",
        accusative: "Auroro",
        locative: "Aurori",
        instrumental: "Auroro"
    <aboutOld "O brskalniku {{ brandShortName }}">
    <about "O {{ brandShortName.locative }}">

Where that would allow you to have the brand name (in this exampel) translated
based on the items in that first list. This is powerful enough to support
choosing it based on something you pass into the translation engine from the
application being translated as well.

Another example of this is when you need to adjust the translation based on
the gender of the subject (though I don't tihnk we'll use this on PyPI since
we're unlikely to ever collect that information), but L20n makes this possible
if you pass the gender into the translation engine like:

    # Thing that gets passed in
        "user": {
            "name": "Jane",
            "followers": 1337,
            "gender": "feminine"

    # Translation Snippet
    <shared[$user.gender] {
        masculine: "{{ $user.name }} shared your post to his {{ $user.followers }} follower(s).",
        feminine: "{{ $user.name }}  shared your post to her {{ $user.followers }} follower(s).",
        *default: "{{ $user.name }} shared your post to their {{ $user.followers }} follower(s)."

In addition to that, L20n also natively understands HTML which makes it a bit
easier to work with. In a traditional gettext based system, if you wanted to
do something like translate a string of text that contains a link to something
you'd need to do something like this:

    'This is a sentence that has an embedded <a href="%(url)s">link</a>'

Then you need to expect your translators to correctly generate that HTML,
including and classes or style information that is in it (and if you alter that
all translations need to be updated to fix it).

However, in L20n you can simple do something like this:

    <p data-l10n-id="mySentence">This is a sentence that has an embedded <a href="https://../">link</a></p>

Then when your translators go to translate it, they only need to do:

    <mySentence "This is a translated sentence with a <a>link</a>">

They never need to worry about matching the exact HTML, they just need to worry
about marking the structure correctly. The one downside to this, is that there
is not currently any way for them to *reorder* the HTML elements (like if you
have two links) which I'm not sure how big of a deal that is.

Finally, on the L20n vs gettext side, L20n forces you to define IDs for your
translations instead of reusing the source (generally English) text as your
ID. This means that people can be free to tweak the English text in ways that
do not alter the semantics of the statement without having that affect the
other translations, as long as the ID stays the same all existing translations
will continue to be used.

Now, all of the above could be written serverside in Python and not require
anything of the end user's browser. We're currently using L20n.js instead of
something serverside for a few reasons.

The biggest and most obvious reason is because L20n.js is currently the only
implementation of L20n that exists, so moving to L20n at the server side would
require us to devote time to writing that instead of working on Warehouse

Another reason is that L20n.js also allows you to do what they call "responsive
translations". Essentially it allows you to have variants of your translation
based on properties of the end user's machine such as operating system, window
size, time of day, etc. This makes it easy to have a translation say, switch
between a longer form of a translation when running full screen in a large
browser or a shorter form when running on a small smart phone screen without
having a fairly common (I think?) problem where the source text and the
translated text are vastly different in length.

The final reason is that by moving translation into the client side we can
increase the chance that for any particular page a user visits they will be
served directly out of a Fastly POP located closely to them instead of needing
to round trip from the Fastly POP located closely to them, to the Fastly POP
located in Ashburn, Virgnia, USA, to the PyPI servers located in another DC in
Virgnia. Instead of needing to cache and serve a different variant of the page
for every single language, we can instead have a single variant of the page
for all languages, and a single language file for each language that is used
for all pages (much like CSS) which will increase the cache hit ration. This
will also make it more likely that if the PyPI origin servers go down, that the
user won't get an error response (since serving out of cache never hits the
backend servers, and Fastly is configured to serve stale responses from the
cache on the case of an Error). Even for users of lesser used languages that
might not have their language file cached, if the PyPI servers are down,
they'll still get the English variant served from cache but if their language
file isn't cache and PyPI is down, they just won't get it translated, falling
back to the english version in that case.

Anyways, I don't know if all of these reasons are good enough reasons to impose
the browser requirement on our users. On paper it certainly appears to me like
they are, giving people better, higher quality translations seems to be a good
thing to me. However, I don't know how bad the limitations of gettext are in
practice. I will say that under 1% of our views (not users, views) on PyPI
currently come from browser set to something other than English AND which
l20n.js does not support but it's possible that this is a chicken/egg situation
where we're not getting a lot of traffic from those users because PyPI isn't
translated in a way that they can use. I am also unsure how the fact that the
majority of PyPI's content will still be in English (since this is just for the
UI, not the content) affects all of this, but there are projects out there
where the content is definitely not English (though I am unsure what languages
they are, the ones I've seen use some sort of Asian looking lettering).

Hopefully this was useful information!

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20160126/23eb43e3/attachment-0001.sig>

More information about the Distutils-SIG mailing list