PEP 501 - i18n with marked strings

Hi, I haven't done i18n recently, so bare with me. I'm not sure about bolting this on to "format strings", in that it feels like an orthogonal concept. However, what if we had i18n strings as well, not instead of: i'Hello there, $name {age}.' and that they were complimentary to f'', each handling their different duties: fi'Hello there, $name {age}.' Different syntax would probably be needed for each, is that correct? Since each have different requirements, e.g. Barry's concerns about format strings being too powerful for non-developers, while also making a project vulnerable to arbitrary code. Perhaps PEP 498 with non-arbitrary strings (but attribute/keys support) would allow the syntax to be unified. -Mike

I actually thought this was about a two-step process using lazy evaluation. This way {name:i18n} or {name:later} basically marks lazy evaluation. But as it seems, i'...' is more supposed to do all (translation + formatting) of this at once. My fault, sorry. On 11.08.2015 10:35, Petr Viktorin wrote:

Not exactly. Take this string for instance: f'hello {name}' And our FString implementation, very simple: class FString(str): def __init__(self, value, **kwargs): super().__init__(value.format(**self.kwargs)) self.value = value self.kwargs = kwargs What the above f-string should do is create an instance of that class. This is just a compiler detail. A preprocessor step. Like this: FString('hello {name}', name=str(name)) FString is just an str instance, it has the actual interpolated value, but it still contains the original uninterpolated string and all parameters (as strings as well.) Now, what gettext can do, if we would wrap this string in the underscore function, is take the "value" attribute from this string FString, translate that, and apply the interpolation again. This way, we are completely compatible with the format() call. There is no need at all for using globals/locals or _getframe(). The name bindings are static, this is lintable. Please tell me if I'm missing something. 2015-08-11 19:12 GMT+02:00 Sven R. Kunze <srkunze@mail.de>:

On Aug 11, 2015, at 16:22, Jonathan Slenders <jonathan@slenders.be> wrote:
So you want every i18n string to interpolate the string, ignore that, look up the raw string, re-interpolate that, and somehow modify the string to hold the new l10n+interpolated value instead? Besides the performance cost of interpolating every string twice for no reason, and the possibility of irrelevant errors popping up while doing so, it's also impossible given that strings are immutable. (That also means you have to use a __new__ rather than __init__, by the way, but that's just a minor quibble.) Also, what happens if the translated string uses a variable that the original string didn't? For example, maybe your English string uses {Salutation} and {Last Name}, but your Chinese string has no need for a salutation, and your Icelandic string only uses the first name. People don't do that very often, but that's partly because many i18n systems are too inflexible to handle it. Python's str.format makes it relatively easy to build something that is flexible enough. Nick's proposal and Barry's both are. This one isn't.

I think I understood that. How can I differentiate between {variables} to be translated and variables not to be translated? I thought this was the intention of Mike's idea: unifying both i and f as they are orthogonal to each other. As I don't like the $ so much, I proposed using {...} as well with a special marker i18n or something. That could be completely useless, I am unsure. On 12.08.2015 01:22, Jonathan Slenders wrote:

On 8/10/15 4:31 PM, Mike Miller wrote:
You haven't said what this would *mean*, precisely. I18n tends to get very involved, and is often specific to the larger framework that you are using. Most people have adopted conventions that mean the syntax is already quite simple for strings to be localized: _("Hello there, {name}").format(name=name) What would i"" bring to the table? --Ned.

On Tue, Aug 11, 2015 at 1:14 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
Well, if you want to get the equivalent of: _("Hello there, {name}").format(name=name) you can't use: _(f"Hello there, {name}") because then the `_` function would get the substituted string. The translation database only contains "Hello there, {name}", not "Hello there, Ned"; you need to pass the former to `_`. In other words, if f was a function instead of a prefix, you want to call f(_("string")), not _(f("string")). The i"" would allow specifying a translation function, which is typically custom but project- (or at least module-) global.

On Aug 11, 2015, at 12:50 PM, Petr Viktorin wrote:
Not having to repeat the variable name three times.
To me, this is really the crux of both the f-string and i-string proposals. It's also a more general issue it me because it's almost exactly the raison d'être for flufl.i18n. The complicated examples of f-strings I've seen really give me the shudders. Maybe in practice it won't be so bad, but it's definitely true that if it can be done, someone will do it. So I expect to see "abuses" of them in the wild. But the DRY argument is much more compelling to me, and currently I think the best way to reduce repetition in function arguments is through sys._getframe() and other such nasty tricks. I'd really much prefer to see this small annoyance fixed in a targeted way than add a hugely complicated new feature that reduces readability (IMHO). Which is why I like the scope() and similar ideas. Something like a built-in that provides you with a ChainMap of the current namespaces in effect. The tricky bit is that you still need something like _getframe()'s depth argument, or perhaps the object returned by scope() -or whatever it's called- would have links back to the namespaces of earlier call frames. I also don't know whether all of this makes sense for all the alternative implementations, but there's certainly a *logical* call stack for any particular point in a Python program. What's the simplest thing we can do to make this pain go away? A few extraneous locals really aren't that bad. They'll be rarely needed, and besides I already use such things when the alternative is a hideously long line of code. In any case, they're a small price to pay for keeping things simple. Cheers, -Barry

From: Barry Warsaw [mailto:barry@python.org]
This +100. I've been doing my own variants on the f-string for over a decade and every time I type "sys.__getframe()" I think to myself, "There's got to be a better way..." Here's a stripped-down example from Real Life (gah!):

I like the i-string idea because it enables IDE support (syntax, click-to-definition etc.). Everything else are just custom workarounds (not saying they are bad, but I like a standardized syntax). Furthermore, scope() might have its some merit on its own. :) On 11.08.2015 22:32, Eric Fahlgren wrote:

On Aug 11 2015, Barry Warsaw <barry-+ZN9ApsXKcEdnm+yROfE0A@public.gmane.org> wrote:
You mean instead of allowing expressions inside strings, you want to make it easier for functions to mess with their callers scope? def test(): x = 3 print(x) --> 3 increas_my_x() print(x) --> 4 def increase_my_x(): scope(depth=-1)['x'] += 1 Somehow I think the risk of abuse here is much higher than with expression strings. At least their effects are local. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Aug 11, 2015, at 19:33, Nikolaus Rath <Nikolaus@rath.org> wrote:
I think he was proposing an immutable mapping (or at worst one that is mutable, but is or at least may be detached copy, a la locals()). And if he wasn't, it's trivial to change his proposal into one using immutable mappings. Which still retains all the benefits for string formatting, and does have the problem you raised.

On Wed, Aug 12, 2015 at 6:02 AM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
If I understand the proposal for scope() correctly, it's just a cleverer way to spell locals() etc.[1] and that means I don't want it to play any role in the string formatting proposal. It also has the same problems as locals(), sys._getframe(), etc., which is that their presence makes certain optimizations harder (in IronPython IIRC the creation of frame objects is normally skipped to speed up function calls, but the optimizer must detect the presence of those functions in order to disable that optimization). That doesn't mean I'm opposed to it (I don't have a problem with locals()), but it does mean that I think their use should probably not be encouraged. TBH I'm sorry Barry, but whenever someone use DRY as a rallying cry I get a bad taste in my mouth. The solutions that are then proposed are too often uglier than the problem. (So I'm glad PEP 498 doesn't mention DRY. :-) [1] I know it's not just locals(), but it's too much of a mouthful to give the full definition. -- --Guido van Rossum (python.org/~guido)

On Aug 12, 2015, at 08:50 AM, Guido van Rossum wrote:
I think he was proposing an immutable mapping
FTR, yes. Immutable reads are all that's required for i18n.
I'm much less concerned about the performance impact loss of optimization provides because I think i18n is already generally slower... and that's okay! I mean _() has to at least do a dictionary look (assuming the catalog is warmed in memory) and then a piece-wise interpolation into the resulting translated string. So you're already paying runtime penalty to do i18n.
I can appreciate that. It reminds me of the days of Python before keyword arguments. Remember the fun we had with tkinter back then? :) i18n is one of those places where DRY really is a limiting factor. You just can't force coders to pass in all the arguments to their translated strings, say into the _() function. The code looked horrible, it's way too much typing, and people (well, *I* ;) just won't do it. After implementing the sys._getframe() hack, it made i18n just so much more pleasant and easy to write, you almost couldn't not do it. One of the things that intrigues me about this whole idea of syntactic and compiler support is the ability to narrow down the set of substitution values available for interpolation, by parsing the source string and passing them into the interpolation call. Currently, _() is forced to expose all of locals and global to interpolation, although I guess it could also parse out the $-placeholders in the source string too[1]. Not doing this does open an information leak vector via maliciously translated strings. If the source string were parsed and *only* those names were available for interpolation, a maliciously translated string couldn't be used to expose additional information because the keys in the interpolation dictionary would be limited. This mythical scope() could take arguments which would name the variables in the enclosing scopes it should export. It would still be a PITA if used explicitly, but could work nicely if i-strings essentially boiled down to: placeholders = source_string.extract_placeholders() substitutions = scope(*placeholders) translated_string = i18n.lookup(source_string) return translated_string.safe_substitute(substitutions) That would actually be quite useful. Cheers, -Barry [1] https://gitlab.com/warsaw/flufl.i18n/issues/1

On Wed, Aug 12, 2015 at 6:06 PM, Barry Warsaw <barry@python.org> wrote:
Fair enough. (Though IMO the real cost of i18n is that it introduces a feeling of programming in molasses.)
Agreed. At Dropbox we use %(name)s in our i18n strings and the code always ends up looking ugly.
Yes, this is a real advantage of pursuing the current set of ideas further.
Agreed. But whereas you are quite happy having only simple variable names in i18n templates, the feature required for the non-i18n use case really needs arbitrary expressions. If we marry the two, your i18n code will just have to yell at the programmer if they use something too complex for the translators as a substitution. So possibly PEP 501 can be rescued. But I think we need separate prefixes for the PEP 498 and PEP 501 use cases; perhaps f'{...}' and _'{...}'. (But it would not be up to the compiler to limit the substitution syntax in _'{...}') -- --Guido van Rossum (python.org/~guido)

On 08/13/2015 12:37 AM, Guido van Rossum wrote:
On Wed, Aug 12, 2015 at 6:06 PM, Barry Warsaw <barry@python.org <mailto:barry@python.org>> wrote:
For the sake of the following argument, let's agree to disagree on: - arbitrary expressions: we'll say yes - string prefix character: we'll say 'f' - how to identify expressions in a string: we'll say {...} I promise we can bikeshed about these later. I'm just using the PEP 498 version because I'm more familiar with it. And let's say that PEP 498 will take this: name = 'Eric' dog_name = 'Fluffy' f"My name is {name}, my dog's name is {dog_name}" And convert it to this (inspired by Victor): "My name is {0}, my dog's name is {1}".format('Eric', 'Fluffy') Resulting in: "My name is Eric, my dog's name is Fluffy" It seems to me that all you need for i18n is to instead make it produce: __i18n__("My name is {0}, my dog's name is {1}").format('Eric', 'Fluffy') The __i18n__ function would do whatever lookup is needed to produce the translated string. So, in some English dialect where pet names had to come first, it could return: 'The owner of the dog {1} is named {0}' So the result would be: 'The owner of the dog Fluffy is named Eric' I promise we can bikeshed about the name __i18n__. So the translator has no say in how the expressions are evaluated. This removes any concern about information leakage. If the source code said: f"My name is {name}, my dog's name is {dog_name.upper()}" then the string being passed to __i18n__ would remain unchanged. If by convention you wanted to not use arbitrary expressions and just use identifiers, then just make it a coding standard thing. It doesn't affect the implementation one way or the other. The default implementation for my proposed __i18n__ function (probably a builtin) would be just to return its string argument. Then you get the PEP 498 behavior. But in your module, you could say: __i18n__ = gettext.gettext and now you'd be using that machinery. The one downside of this is that the strings that the translator is translating from do not appear in the source code. The translator would have to know that the string being translated is: "My name is {0}, my dog's name is {1}" But since this only operates on f-string literals, you could mechanically extract them from the source. For example, given the example f-string above, my current PEP 498 implementation returns this: 'Module(body=[Expr(value=FormattedStr(value=Call(func=Attribute(value=Str(s="My name is {0}, my dog\'s name is {1}"), attr=\'format\', ctx=Load()), args=[Name(id=\'name\', ctx=Load()), Name(id=\'dog_name\', ctx=Load())], keywords=[])))])' So the translatable string can easily be extracted from the ast. I could modify the FormattedStr node to make that string easier to find. Eric.

On 2015-08-13 12:58, Eric V. Smith wrote:
I think that looking up only the translation string and then inserting the values isn't good enough. For example, what if the string was "Found {0} matches"? If the number of matches was 1, you'd get "Found 1 matches". Ideally, you'd want to pass the values too, so that the lookup could pick the correct translation. [snip]

On 08/13/2015 08:23 AM, MRAB wrote:
That's certainly doable. You could pass in the values as a tuple, and either have __i18n__ call .format itself, or still just return the translated string and then call .format on the result. def __i18n__(message, values): return message But I'm not sure how much of this to build in to the f-string machinery. gettext.gettext doesn't solve this problem by itself, either. Eric.

On Aug 13, 2015, at 09:40 AM, Eric V. Smith wrote:
But I'm not sure how much of this to build in to the f-string machinery. gettext.gettext doesn't solve this problem by itself, either.
Our gettext module does have some support for plural forms, but it's probably not great. https://docs.python.org/2/library/gettext.html#gettext.GNUTranslations.ngett... See also for reference: https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html For any built-in machinery such as f-strings we'd want to at least make sure it's possible to support plural forms. Cheers, -Barry

On 8/13/2015 08:23, MRAB wrote:
Why would we solve this on new-formatting, but not in old-formatting when doing i18n? You have identified an existing problem (pluralization), the solutions to which would also work to solve the problem under consideration.

On Aug 13, 2015, at 07:58 AM, Eric V. Smith wrote:
I think unfortunately, this is a non-starter for the i18n use case. The message catalog must include the source string as it appears in the code because otherwise, translators will not be able to reliably map the intended meaning to their native language. They'll have to keep a mental map between source string placeholders and numeric placeholders, and I am fairly confident that this will be a source of broken translations. Is there a problem with keeping the named placeholders throughout the entire stack? Cheers, -Barry

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/13/2015 10:00 AM, Barry Warsaw wrote:
I guess not. It complicates things in the non-translated case, but I think it's probably all workable. I'll give it some thought. But is that enough? I'm not exactly sure what goal we're trying to achieve here. If it's to entirely replace the gettext module, including things like ngettext, then I think it's not an achievable goal, and we should just give up. If it's only to replace gettext.gettext (commonly used as "_"), then I think there's hope. Eric. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAEBAgAGBQJVzeeqAAoJENxauZFcKtNxYlQH/REa+PV0Rhsr3NMrNdzfsuw/ 6kOL9CItiSqjTOit/nVPR56ZpkHuGTnVO0QMCUgUmPpzU4arP945OwZj/ObSh8Jm QTAyho1El4riDAgA7qywxzij2Z3imtuBDEAwkp022WdjKbbQ3/I2mTG9d4mPHBQc Sl9qMqPoPjKwoJzTahqWJ0vgxqQ+ZfjaXKzgv581GPJknp4KG5i5Zw/U5oDFj+Oh tsvedi25qWN7iSR60cfAZ/2/WfidgwlGH8Bb1V3JYj7B59Zsvkcg7VVYhQSkQrc7 XXJXdFdgxUDH/OXiwQLxTsBJ0AjJah7ZTiq8LOeql9BgcLQMXV306JOIUWcqKa8= =Jcwh -----END PGP SIGNATURE-----

On Aug 14, 2015, at 09:05 AM, Eric V. Smith wrote:
Thanks.
From my perspective, exactly this. I don't expect or want to replace gettext. Part of the focus of PEP 501 is to enable gettext as a use case, binding it opportunistically to the __interpolate__ built-in. Without that binding, you get "normal" interpolation. Cheers, -Barry

On Aug 14, 2015, at 11:00 AM, Barry Warsaw <barry@python.org> wrote:
One thing that concerns me about gettext integration is the tooling support. For example, could pygettext be taught about f-strings, and could it be made to handle cases such as the 3rd example in: https://docs.python.org/3/library/gettext.html#deferred-translations ? That is: some f-strings in a module that are i18n-aware, and some that aren't. If the "built in" nature of f-strings mean that the tooling can't detect all of the desired use cases, should we move forward with an i18n-friendly version of f-strings? I'm concerned about designing a lot of plumbing for i18n, but no one will end up using because it can't do quite enough. Eric.

On Aug 14, 2015, at 10:32 PM, Eric V. Smith wrote:
One thing that concerns me about gettext integration is the tooling support.
That worries me about PEP 498 too, but for different reasons. See my other follow up (in python-dev I think).
That's a great question. It could be solved by having a prefix explicitly for i18n extraction, e.g. PEP 501's i-strings. I agree that mixing translatable strings with strings-not-to-be-translated is an issue worth figuring out because you don't want to overload translators with a bunch of string they don't have to translate. As for deferred translations, they are rare enough that some alternative spelling is IMHO acceptable. Cheers, -Barry

On 08/13/2015 07:58 AM, Eric V. Smith wrote:
Okay, here's a new proposal that handles Barry's concern about the format strings passed to __i18n__ not having the same contents as the source code. Instead of translating: name = 'Eric' dog_name = 'Fluffy' f"My name is {name}, my dog's name is {dog_name}" to: __i18n__("My name is {0}, my dog's name is {1}").format('Eric', 'Fluffy') We instead translate it to: __i18n__("My name is {name}, my dog's name is {dog_name}").format_map({'name':'Eric', 'dog_name':'Fluffy') The string would be unchanged from value of the f-string. The keys in the dict would be exactly the expressions inside the braces in the f-string. The values in the dict would be the value of the expressions in the f-string. This solution works for cases where the expressions inside braces are either simple identifiers, or are more complicated expressions. For i18n work, I'd expect them to all be simple identifiers, but that need not be the case. I consider this a code review item. We could add something like's PEP 501's iu-strings, that would be interpolated but not translated, so we could mix translated and non-translated strings in the same module. Probably not spelled fu-strings, though! We'd probably want to add a str.safe_format_map to match the behavior of string.Template.safe_substitute, or add a parameter to str.format_map. I'm not sure how this parameter would get set from an f-string, or if it would always default to "safe" for the __i18n__ case. Maybe instead of __i18n__ just doing the string lookup, it would also be responsible for calling .format_map or .safe_format_map, so it could choose the behavior it wanted on a per-module basis. Eric.

On Sat, Aug 15, 2015 at 10:27 PM, Eric V. Smith <eric@trueblade.com> wrote:
I know it's a ridiculous corner case, but what if an expression occurs more than once? Will it be evaluated more than once, or will the exact text of the expression be used as, in effect, a lookup key? With simple expressions it won't make any difference, but anywhere else in Python, if you use the same expression twice, it'll be evaluated twice. user = "rosuav" f"You can log in with user name {user} and your provided password, and your web site is now online at http://{user}.amazinghosting.example/ for all to see. Thank you for using Amazing Hosting!" This kind of example should definitely be supported, but what about a function call? f"... user name {user()} ... http://{user()}.amazinghosting.example/" Do that in any other form of expression, and people will expect two calls. With i18n it'd be impossible to distinguish the two, but I'd still normally expect user() to get called twice. ChrisA

On Aug 15, 2015, at 08:27 AM, Eric V. Smith wrote:
+1
One of the things I've mentioned to Nick about PEP 501 is the difference between i"foo" and iu"foo". The former gets mapped to __interpolate__() while the latter gets mapped to __interpolateu__(). Nick makes the case for this distinction based on the ability to override __interpolate__() in the local namespace to implement i18n, whereas __interpolateu__() - while technically still able to override - would generally just be left to the "normal" non-i18n interpolation. I countered with a proposal that a context manager could be used, but Nick points out that you can't really *unbind* __interpolate__() when the context manager exits. This still seems weird to me. There's no distinction in Python 3 between "foo" and u"foo" with the latter having been re-added to aid in migrations between Python 2 and 3. But with PEP 501, this introduces a functional distinction between i"foo" and iu"foo" (and ui"foo"?). It's handy, but seems to be a fairly significant difference from the current use if u-prefixes. I'm sympathetic but still skeptical. ;)
You always want safe-substitution for i18n because you can't let broken translations break your application (i.e. by causing exceptions to be thrown). It's the lesser of two evils to just include the original, un-interpolated placeholder in the final string. Cheers, -Barry

On 8/17/15 11:51 AM, Barry Warsaw wrote:
I agree that this "one weird trick" of distinguishing between i"" and iu"" is really unfortunate. As you say, in Python 3, "foo" and u"foo" are the same, so why should i"" and iu"" be different? I understand the appeal of interpolated strings, but can we retain some measure of "explicit is better than implicit"? If i18n considerations are this important (and I agree that they are), let's take them seriously enough to give them real syntax. --Ned.

But how would you determine the current active language? Is that a thread local? This would probably not work to make an asyncio or Twisted applications translatable. For an asyncio web application for instance, the translate function needs to know the request object. 2015-08-17 23:37 GMT+02:00 Ned Batchelder <ned@nedbatchelder.com>:

On Tue, Aug 18, 2015 at 10:08 AM, Jonathan Slenders <jonathan@slenders.be> wrote:
Or it would need to return a lazy translation, which would get translated when the request is available (e.g. when inserted into a template).

On Tue, Aug 18, 2015 at 8:32 PM, Petr Viktorin <encukou@gmail.com> wrote:
How hard would this be to implement? Something that isn't a string, retains all the necessary information, and then collapses to a string when someone looks at it? This "quantum string interpolation" model (string theory??) would work well for logging too, as it'd be lazy enough to be efficient - it needn't do the actual interpolation or translation work until later on. ChrisA

On 8/18/2015 4:08 AM, Jonathan Slenders wrote:
I assume it would call gettext.gettext (often aliased as '_'). It would inherit the gettext behavior. That's the only use case I've heard suggested. Eric.

On 18 August 2015 at 07:37, Ned Batchelder <ned@nedbatchelder.com> wrote:
I was hoping to avoid a proliferation of new string prefixes and get away with only one :) However, I like Guido's(?) suggestion of using "_" as the prefix to distinguish the i18n runtime translation case from the plain string interpolation case, so having both "i" and "iu" invoke str.format, "ib" invoke bytes.__mod__, and "_" invoke a __translate__ builtin (and/or thread local translation context) seems reasonable. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Time to write up a slimmed-down proposal that merges the best ideas from both PEPs? The first version doesn't have to have a PEP-level document, but it should clarify the actual proposal by showing the various usage patterns. On Thu, Aug 20, 2015 at 1:34 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Thursday, August 20, 2015 at 9:38:40 PM UTC+5:30, Guido van Rossum wrote:
Reading these new string-PEP discussions I am reminded that Unicode has a load of quote-ish characters waiting in the aisle… Nice listing here http://xahlee.info/comp/unicode_matching_brackets.html

On 8/13/2015 12:37 AM, Guido van Rossum wrote:
Fair enough. (Though IMO the real cost of i18n is that it introduces a feeling of programming in molasses.)
For some structured situations, such as gui menus, the molasses is not needed. _(...) does two things: mark a string for the translator collector, and actually do the translation. Idle defines 'menudefs' structures, which are lists of menu tuples. The first item of each tuple is the string to be displayed on the menu, the second is the binding for that item, either a pseudoevent or a list of menu tuples for a submenu. A function walks the structure to extract the names to pass to tk menu calls. For internationalization, the gettext.gettext translation call could be added in one place, where the string is passed to tk, rather than 80 places in the structure definition. An altered version of the menudefs walker could be used to collect the menu strings for translation. If we want to encourage multi-language tkinter apps, i18n code should be added somewhere public in the tkinter package (and gettext module), rather than hidden away in idlelib. -- Terry Jan Reedy

On Aug 13, 2015, at 10:04 AM, Terry Reedy wrote:
That would require being able to translate non-literals. I'd need the same, and it would be okay if the translation call were spelled less conveniently, as long as it's possible to both extract and translate the source strings. Cheers, -Barry

On 8/13/2015 11:39 AM, Barry Warsaw wrote:
I don't understand, Idle's menus are built from string literals -- no variables, not interpolation -- like 'File', 'Open', 'Open Module', etc. I think this is fairly typical.
With table-driven ui creation, extraction for human translators and replacement of the original by the translation can be done with a pair of related functions. With code-driven ui creation (as currently with Idle dialogs), an extraction function may be possible (if the string literals are tagged with keywords such as 'title=' or 'text=') but translation still requires addition of a _() call for each arguments that needs translation. -- Terry Jan Reedy

On 08/13/2015 09:58 PM, Terry Reedy wrote:
It's the "could be added in one place" part that would require working on non-literals. In that one place, you'd be operating on a variable, not a literal. Eric.

On 10.08.2015 22:31, Mike Miller wrote:
IMO, having just one string literal interpolation standard is better than having two and since i"" fits both needs, I'm +1 on i"" and -0 on f"". The only problem I see with i"" is that you may want to use formatting only in some cases, without triggering the translation machinery which may be active in a module. I guess it's fine to fallback to the standard .format() or %-approach for those few situations, though. In all other use cases, having the literal strings already prepared for translation in a Python module is a huge win: just drop a translation hook into the module and you're good to go :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 11 2015)
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 11.08.2015 15:10, Jonathan Slenders wrote:
-1 on any approach that uses a translation hook. Many frameworks have their own way of translating things. So that should definitely not be a global.
The module global approach is only one way to define a __interpolate__ function. As I understand the PEP, the compiler would simply translate the literal into a regular function call, which then is subject to the usual scoping rules in Python. It would therefore be possible to override the builtin in a local scope to e.g. address things like context or per-session based i18n. You could e.g. pass in a ${context} variable to the string, so that your __interpolate__ function can then directly access the required translation context. Alternatively, the __interpolate__ function could inspect the call stack to automatically find the needed context variable. I guess this particular use case could be made more elegant :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 11 2015)
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

I actually thought this was about a two-step process using lazy evaluation. This way {name:i18n} or {name:later} basically marks lazy evaluation. But as it seems, i'...' is more supposed to do all (translation + formatting) of this at once. My fault, sorry. On 11.08.2015 10:35, Petr Viktorin wrote:

Not exactly. Take this string for instance: f'hello {name}' And our FString implementation, very simple: class FString(str): def __init__(self, value, **kwargs): super().__init__(value.format(**self.kwargs)) self.value = value self.kwargs = kwargs What the above f-string should do is create an instance of that class. This is just a compiler detail. A preprocessor step. Like this: FString('hello {name}', name=str(name)) FString is just an str instance, it has the actual interpolated value, but it still contains the original uninterpolated string and all parameters (as strings as well.) Now, what gettext can do, if we would wrap this string in the underscore function, is take the "value" attribute from this string FString, translate that, and apply the interpolation again. This way, we are completely compatible with the format() call. There is no need at all for using globals/locals or _getframe(). The name bindings are static, this is lintable. Please tell me if I'm missing something. 2015-08-11 19:12 GMT+02:00 Sven R. Kunze <srkunze@mail.de>:

On Aug 11, 2015, at 16:22, Jonathan Slenders <jonathan@slenders.be> wrote:
So you want every i18n string to interpolate the string, ignore that, look up the raw string, re-interpolate that, and somehow modify the string to hold the new l10n+interpolated value instead? Besides the performance cost of interpolating every string twice for no reason, and the possibility of irrelevant errors popping up while doing so, it's also impossible given that strings are immutable. (That also means you have to use a __new__ rather than __init__, by the way, but that's just a minor quibble.) Also, what happens if the translated string uses a variable that the original string didn't? For example, maybe your English string uses {Salutation} and {Last Name}, but your Chinese string has no need for a salutation, and your Icelandic string only uses the first name. People don't do that very often, but that's partly because many i18n systems are too inflexible to handle it. Python's str.format makes it relatively easy to build something that is flexible enough. Nick's proposal and Barry's both are. This one isn't.

I think I understood that. How can I differentiate between {variables} to be translated and variables not to be translated? I thought this was the intention of Mike's idea: unifying both i and f as they are orthogonal to each other. As I don't like the $ so much, I proposed using {...} as well with a special marker i18n or something. That could be completely useless, I am unsure. On 12.08.2015 01:22, Jonathan Slenders wrote:

On 8/10/15 4:31 PM, Mike Miller wrote:
You haven't said what this would *mean*, precisely. I18n tends to get very involved, and is often specific to the larger framework that you are using. Most people have adopted conventions that mean the syntax is already quite simple for strings to be localized: _("Hello there, {name}").format(name=name) What would i"" bring to the table? --Ned.

On Tue, Aug 11, 2015 at 1:14 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
Well, if you want to get the equivalent of: _("Hello there, {name}").format(name=name) you can't use: _(f"Hello there, {name}") because then the `_` function would get the substituted string. The translation database only contains "Hello there, {name}", not "Hello there, Ned"; you need to pass the former to `_`. In other words, if f was a function instead of a prefix, you want to call f(_("string")), not _(f("string")). The i"" would allow specifying a translation function, which is typically custom but project- (or at least module-) global.

On Aug 11, 2015, at 12:50 PM, Petr Viktorin wrote:
Not having to repeat the variable name three times.
To me, this is really the crux of both the f-string and i-string proposals. It's also a more general issue it me because it's almost exactly the raison d'être for flufl.i18n. The complicated examples of f-strings I've seen really give me the shudders. Maybe in practice it won't be so bad, but it's definitely true that if it can be done, someone will do it. So I expect to see "abuses" of them in the wild. But the DRY argument is much more compelling to me, and currently I think the best way to reduce repetition in function arguments is through sys._getframe() and other such nasty tricks. I'd really much prefer to see this small annoyance fixed in a targeted way than add a hugely complicated new feature that reduces readability (IMHO). Which is why I like the scope() and similar ideas. Something like a built-in that provides you with a ChainMap of the current namespaces in effect. The tricky bit is that you still need something like _getframe()'s depth argument, or perhaps the object returned by scope() -or whatever it's called- would have links back to the namespaces of earlier call frames. I also don't know whether all of this makes sense for all the alternative implementations, but there's certainly a *logical* call stack for any particular point in a Python program. What's the simplest thing we can do to make this pain go away? A few extraneous locals really aren't that bad. They'll be rarely needed, and besides I already use such things when the alternative is a hideously long line of code. In any case, they're a small price to pay for keeping things simple. Cheers, -Barry

From: Barry Warsaw [mailto:barry@python.org]
This +100. I've been doing my own variants on the f-string for over a decade and every time I type "sys.__getframe()" I think to myself, "There's got to be a better way..." Here's a stripped-down example from Real Life (gah!):

I like the i-string idea because it enables IDE support (syntax, click-to-definition etc.). Everything else are just custom workarounds (not saying they are bad, but I like a standardized syntax). Furthermore, scope() might have its some merit on its own. :) On 11.08.2015 22:32, Eric Fahlgren wrote:

On Aug 11 2015, Barry Warsaw <barry-+ZN9ApsXKcEdnm+yROfE0A@public.gmane.org> wrote:
You mean instead of allowing expressions inside strings, you want to make it easier for functions to mess with their callers scope? def test(): x = 3 print(x) --> 3 increas_my_x() print(x) --> 4 def increase_my_x(): scope(depth=-1)['x'] += 1 Somehow I think the risk of abuse here is much higher than with expression strings. At least their effects are local. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Aug 11, 2015, at 19:33, Nikolaus Rath <Nikolaus@rath.org> wrote:
I think he was proposing an immutable mapping (or at worst one that is mutable, but is or at least may be detached copy, a la locals()). And if he wasn't, it's trivial to change his proposal into one using immutable mappings. Which still retains all the benefits for string formatting, and does have the problem you raised.

On Wed, Aug 12, 2015 at 6:02 AM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
If I understand the proposal for scope() correctly, it's just a cleverer way to spell locals() etc.[1] and that means I don't want it to play any role in the string formatting proposal. It also has the same problems as locals(), sys._getframe(), etc., which is that their presence makes certain optimizations harder (in IronPython IIRC the creation of frame objects is normally skipped to speed up function calls, but the optimizer must detect the presence of those functions in order to disable that optimization). That doesn't mean I'm opposed to it (I don't have a problem with locals()), but it does mean that I think their use should probably not be encouraged. TBH I'm sorry Barry, but whenever someone use DRY as a rallying cry I get a bad taste in my mouth. The solutions that are then proposed are too often uglier than the problem. (So I'm glad PEP 498 doesn't mention DRY. :-) [1] I know it's not just locals(), but it's too much of a mouthful to give the full definition. -- --Guido van Rossum (python.org/~guido)

On Aug 12, 2015, at 08:50 AM, Guido van Rossum wrote:
I think he was proposing an immutable mapping
FTR, yes. Immutable reads are all that's required for i18n.
I'm much less concerned about the performance impact loss of optimization provides because I think i18n is already generally slower... and that's okay! I mean _() has to at least do a dictionary look (assuming the catalog is warmed in memory) and then a piece-wise interpolation into the resulting translated string. So you're already paying runtime penalty to do i18n.
I can appreciate that. It reminds me of the days of Python before keyword arguments. Remember the fun we had with tkinter back then? :) i18n is one of those places where DRY really is a limiting factor. You just can't force coders to pass in all the arguments to their translated strings, say into the _() function. The code looked horrible, it's way too much typing, and people (well, *I* ;) just won't do it. After implementing the sys._getframe() hack, it made i18n just so much more pleasant and easy to write, you almost couldn't not do it. One of the things that intrigues me about this whole idea of syntactic and compiler support is the ability to narrow down the set of substitution values available for interpolation, by parsing the source string and passing them into the interpolation call. Currently, _() is forced to expose all of locals and global to interpolation, although I guess it could also parse out the $-placeholders in the source string too[1]. Not doing this does open an information leak vector via maliciously translated strings. If the source string were parsed and *only* those names were available for interpolation, a maliciously translated string couldn't be used to expose additional information because the keys in the interpolation dictionary would be limited. This mythical scope() could take arguments which would name the variables in the enclosing scopes it should export. It would still be a PITA if used explicitly, but could work nicely if i-strings essentially boiled down to: placeholders = source_string.extract_placeholders() substitutions = scope(*placeholders) translated_string = i18n.lookup(source_string) return translated_string.safe_substitute(substitutions) That would actually be quite useful. Cheers, -Barry [1] https://gitlab.com/warsaw/flufl.i18n/issues/1

On Wed, Aug 12, 2015 at 6:06 PM, Barry Warsaw <barry@python.org> wrote:
Fair enough. (Though IMO the real cost of i18n is that it introduces a feeling of programming in molasses.)
Agreed. At Dropbox we use %(name)s in our i18n strings and the code always ends up looking ugly.
Yes, this is a real advantage of pursuing the current set of ideas further.
Agreed. But whereas you are quite happy having only simple variable names in i18n templates, the feature required for the non-i18n use case really needs arbitrary expressions. If we marry the two, your i18n code will just have to yell at the programmer if they use something too complex for the translators as a substitution. So possibly PEP 501 can be rescued. But I think we need separate prefixes for the PEP 498 and PEP 501 use cases; perhaps f'{...}' and _'{...}'. (But it would not be up to the compiler to limit the substitution syntax in _'{...}') -- --Guido van Rossum (python.org/~guido)

On 08/13/2015 12:37 AM, Guido van Rossum wrote:
On Wed, Aug 12, 2015 at 6:06 PM, Barry Warsaw <barry@python.org <mailto:barry@python.org>> wrote:
For the sake of the following argument, let's agree to disagree on: - arbitrary expressions: we'll say yes - string prefix character: we'll say 'f' - how to identify expressions in a string: we'll say {...} I promise we can bikeshed about these later. I'm just using the PEP 498 version because I'm more familiar with it. And let's say that PEP 498 will take this: name = 'Eric' dog_name = 'Fluffy' f"My name is {name}, my dog's name is {dog_name}" And convert it to this (inspired by Victor): "My name is {0}, my dog's name is {1}".format('Eric', 'Fluffy') Resulting in: "My name is Eric, my dog's name is Fluffy" It seems to me that all you need for i18n is to instead make it produce: __i18n__("My name is {0}, my dog's name is {1}").format('Eric', 'Fluffy') The __i18n__ function would do whatever lookup is needed to produce the translated string. So, in some English dialect where pet names had to come first, it could return: 'The owner of the dog {1} is named {0}' So the result would be: 'The owner of the dog Fluffy is named Eric' I promise we can bikeshed about the name __i18n__. So the translator has no say in how the expressions are evaluated. This removes any concern about information leakage. If the source code said: f"My name is {name}, my dog's name is {dog_name.upper()}" then the string being passed to __i18n__ would remain unchanged. If by convention you wanted to not use arbitrary expressions and just use identifiers, then just make it a coding standard thing. It doesn't affect the implementation one way or the other. The default implementation for my proposed __i18n__ function (probably a builtin) would be just to return its string argument. Then you get the PEP 498 behavior. But in your module, you could say: __i18n__ = gettext.gettext and now you'd be using that machinery. The one downside of this is that the strings that the translator is translating from do not appear in the source code. The translator would have to know that the string being translated is: "My name is {0}, my dog's name is {1}" But since this only operates on f-string literals, you could mechanically extract them from the source. For example, given the example f-string above, my current PEP 498 implementation returns this: 'Module(body=[Expr(value=FormattedStr(value=Call(func=Attribute(value=Str(s="My name is {0}, my dog\'s name is {1}"), attr=\'format\', ctx=Load()), args=[Name(id=\'name\', ctx=Load()), Name(id=\'dog_name\', ctx=Load())], keywords=[])))])' So the translatable string can easily be extracted from the ast. I could modify the FormattedStr node to make that string easier to find. Eric.

On 2015-08-13 12:58, Eric V. Smith wrote:
I think that looking up only the translation string and then inserting the values isn't good enough. For example, what if the string was "Found {0} matches"? If the number of matches was 1, you'd get "Found 1 matches". Ideally, you'd want to pass the values too, so that the lookup could pick the correct translation. [snip]

On 08/13/2015 08:23 AM, MRAB wrote:
That's certainly doable. You could pass in the values as a tuple, and either have __i18n__ call .format itself, or still just return the translated string and then call .format on the result. def __i18n__(message, values): return message But I'm not sure how much of this to build in to the f-string machinery. gettext.gettext doesn't solve this problem by itself, either. Eric.

On Aug 13, 2015, at 09:40 AM, Eric V. Smith wrote:
But I'm not sure how much of this to build in to the f-string machinery. gettext.gettext doesn't solve this problem by itself, either.
Our gettext module does have some support for plural forms, but it's probably not great. https://docs.python.org/2/library/gettext.html#gettext.GNUTranslations.ngett... See also for reference: https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html For any built-in machinery such as f-strings we'd want to at least make sure it's possible to support plural forms. Cheers, -Barry

On 8/13/2015 08:23, MRAB wrote:
Why would we solve this on new-formatting, but not in old-formatting when doing i18n? You have identified an existing problem (pluralization), the solutions to which would also work to solve the problem under consideration.

On Aug 13, 2015, at 07:58 AM, Eric V. Smith wrote:
I think unfortunately, this is a non-starter for the i18n use case. The message catalog must include the source string as it appears in the code because otherwise, translators will not be able to reliably map the intended meaning to their native language. They'll have to keep a mental map between source string placeholders and numeric placeholders, and I am fairly confident that this will be a source of broken translations. Is there a problem with keeping the named placeholders throughout the entire stack? Cheers, -Barry

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/13/2015 10:00 AM, Barry Warsaw wrote:
I guess not. It complicates things in the non-translated case, but I think it's probably all workable. I'll give it some thought. But is that enough? I'm not exactly sure what goal we're trying to achieve here. If it's to entirely replace the gettext module, including things like ngettext, then I think it's not an achievable goal, and we should just give up. If it's only to replace gettext.gettext (commonly used as "_"), then I think there's hope. Eric. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAEBAgAGBQJVzeeqAAoJENxauZFcKtNxYlQH/REa+PV0Rhsr3NMrNdzfsuw/ 6kOL9CItiSqjTOit/nVPR56ZpkHuGTnVO0QMCUgUmPpzU4arP945OwZj/ObSh8Jm QTAyho1El4riDAgA7qywxzij2Z3imtuBDEAwkp022WdjKbbQ3/I2mTG9d4mPHBQc Sl9qMqPoPjKwoJzTahqWJ0vgxqQ+ZfjaXKzgv581GPJknp4KG5i5Zw/U5oDFj+Oh tsvedi25qWN7iSR60cfAZ/2/WfidgwlGH8Bb1V3JYj7B59Zsvkcg7VVYhQSkQrc7 XXJXdFdgxUDH/OXiwQLxTsBJ0AjJah7ZTiq8LOeql9BgcLQMXV306JOIUWcqKa8= =Jcwh -----END PGP SIGNATURE-----

On Aug 14, 2015, at 09:05 AM, Eric V. Smith wrote:
Thanks.
From my perspective, exactly this. I don't expect or want to replace gettext. Part of the focus of PEP 501 is to enable gettext as a use case, binding it opportunistically to the __interpolate__ built-in. Without that binding, you get "normal" interpolation. Cheers, -Barry

On Aug 14, 2015, at 11:00 AM, Barry Warsaw <barry@python.org> wrote:
One thing that concerns me about gettext integration is the tooling support. For example, could pygettext be taught about f-strings, and could it be made to handle cases such as the 3rd example in: https://docs.python.org/3/library/gettext.html#deferred-translations ? That is: some f-strings in a module that are i18n-aware, and some that aren't. If the "built in" nature of f-strings mean that the tooling can't detect all of the desired use cases, should we move forward with an i18n-friendly version of f-strings? I'm concerned about designing a lot of plumbing for i18n, but no one will end up using because it can't do quite enough. Eric.

On Aug 14, 2015, at 10:32 PM, Eric V. Smith wrote:
One thing that concerns me about gettext integration is the tooling support.
That worries me about PEP 498 too, but for different reasons. See my other follow up (in python-dev I think).
That's a great question. It could be solved by having a prefix explicitly for i18n extraction, e.g. PEP 501's i-strings. I agree that mixing translatable strings with strings-not-to-be-translated is an issue worth figuring out because you don't want to overload translators with a bunch of string they don't have to translate. As for deferred translations, they are rare enough that some alternative spelling is IMHO acceptable. Cheers, -Barry

On 08/13/2015 07:58 AM, Eric V. Smith wrote:
Okay, here's a new proposal that handles Barry's concern about the format strings passed to __i18n__ not having the same contents as the source code. Instead of translating: name = 'Eric' dog_name = 'Fluffy' f"My name is {name}, my dog's name is {dog_name}" to: __i18n__("My name is {0}, my dog's name is {1}").format('Eric', 'Fluffy') We instead translate it to: __i18n__("My name is {name}, my dog's name is {dog_name}").format_map({'name':'Eric', 'dog_name':'Fluffy') The string would be unchanged from value of the f-string. The keys in the dict would be exactly the expressions inside the braces in the f-string. The values in the dict would be the value of the expressions in the f-string. This solution works for cases where the expressions inside braces are either simple identifiers, or are more complicated expressions. For i18n work, I'd expect them to all be simple identifiers, but that need not be the case. I consider this a code review item. We could add something like's PEP 501's iu-strings, that would be interpolated but not translated, so we could mix translated and non-translated strings in the same module. Probably not spelled fu-strings, though! We'd probably want to add a str.safe_format_map to match the behavior of string.Template.safe_substitute, or add a parameter to str.format_map. I'm not sure how this parameter would get set from an f-string, or if it would always default to "safe" for the __i18n__ case. Maybe instead of __i18n__ just doing the string lookup, it would also be responsible for calling .format_map or .safe_format_map, so it could choose the behavior it wanted on a per-module basis. Eric.

On Sat, Aug 15, 2015 at 10:27 PM, Eric V. Smith <eric@trueblade.com> wrote:
I know it's a ridiculous corner case, but what if an expression occurs more than once? Will it be evaluated more than once, or will the exact text of the expression be used as, in effect, a lookup key? With simple expressions it won't make any difference, but anywhere else in Python, if you use the same expression twice, it'll be evaluated twice. user = "rosuav" f"You can log in with user name {user} and your provided password, and your web site is now online at http://{user}.amazinghosting.example/ for all to see. Thank you for using Amazing Hosting!" This kind of example should definitely be supported, but what about a function call? f"... user name {user()} ... http://{user()}.amazinghosting.example/" Do that in any other form of expression, and people will expect two calls. With i18n it'd be impossible to distinguish the two, but I'd still normally expect user() to get called twice. ChrisA

On Aug 15, 2015, at 08:27 AM, Eric V. Smith wrote:
+1
One of the things I've mentioned to Nick about PEP 501 is the difference between i"foo" and iu"foo". The former gets mapped to __interpolate__() while the latter gets mapped to __interpolateu__(). Nick makes the case for this distinction based on the ability to override __interpolate__() in the local namespace to implement i18n, whereas __interpolateu__() - while technically still able to override - would generally just be left to the "normal" non-i18n interpolation. I countered with a proposal that a context manager could be used, but Nick points out that you can't really *unbind* __interpolate__() when the context manager exits. This still seems weird to me. There's no distinction in Python 3 between "foo" and u"foo" with the latter having been re-added to aid in migrations between Python 2 and 3. But with PEP 501, this introduces a functional distinction between i"foo" and iu"foo" (and ui"foo"?). It's handy, but seems to be a fairly significant difference from the current use if u-prefixes. I'm sympathetic but still skeptical. ;)
You always want safe-substitution for i18n because you can't let broken translations break your application (i.e. by causing exceptions to be thrown). It's the lesser of two evils to just include the original, un-interpolated placeholder in the final string. Cheers, -Barry

On 8/17/15 11:51 AM, Barry Warsaw wrote:
I agree that this "one weird trick" of distinguishing between i"" and iu"" is really unfortunate. As you say, in Python 3, "foo" and u"foo" are the same, so why should i"" and iu"" be different? I understand the appeal of interpolated strings, but can we retain some measure of "explicit is better than implicit"? If i18n considerations are this important (and I agree that they are), let's take them seriously enough to give them real syntax. --Ned.

But how would you determine the current active language? Is that a thread local? This would probably not work to make an asyncio or Twisted applications translatable. For an asyncio web application for instance, the translate function needs to know the request object. 2015-08-17 23:37 GMT+02:00 Ned Batchelder <ned@nedbatchelder.com>:

On Tue, Aug 18, 2015 at 10:08 AM, Jonathan Slenders <jonathan@slenders.be> wrote:
Or it would need to return a lazy translation, which would get translated when the request is available (e.g. when inserted into a template).

On Tue, Aug 18, 2015 at 8:32 PM, Petr Viktorin <encukou@gmail.com> wrote:
How hard would this be to implement? Something that isn't a string, retains all the necessary information, and then collapses to a string when someone looks at it? This "quantum string interpolation" model (string theory??) would work well for logging too, as it'd be lazy enough to be efficient - it needn't do the actual interpolation or translation work until later on. ChrisA

On 8/18/2015 4:08 AM, Jonathan Slenders wrote:
I assume it would call gettext.gettext (often aliased as '_'). It would inherit the gettext behavior. That's the only use case I've heard suggested. Eric.

On 18 August 2015 at 07:37, Ned Batchelder <ned@nedbatchelder.com> wrote:
I was hoping to avoid a proliferation of new string prefixes and get away with only one :) However, I like Guido's(?) suggestion of using "_" as the prefix to distinguish the i18n runtime translation case from the plain string interpolation case, so having both "i" and "iu" invoke str.format, "ib" invoke bytes.__mod__, and "_" invoke a __translate__ builtin (and/or thread local translation context) seems reasonable. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Time to write up a slimmed-down proposal that merges the best ideas from both PEPs? The first version doesn't have to have a PEP-level document, but it should clarify the actual proposal by showing the various usage patterns. On Thu, Aug 20, 2015 at 1:34 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

On Thursday, August 20, 2015 at 9:38:40 PM UTC+5:30, Guido van Rossum wrote:
Reading these new string-PEP discussions I am reminded that Unicode has a load of quote-ish characters waiting in the aisle… Nice listing here http://xahlee.info/comp/unicode_matching_brackets.html

On 8/13/2015 12:37 AM, Guido van Rossum wrote:
Fair enough. (Though IMO the real cost of i18n is that it introduces a feeling of programming in molasses.)
For some structured situations, such as gui menus, the molasses is not needed. _(...) does two things: mark a string for the translator collector, and actually do the translation. Idle defines 'menudefs' structures, which are lists of menu tuples. The first item of each tuple is the string to be displayed on the menu, the second is the binding for that item, either a pseudoevent or a list of menu tuples for a submenu. A function walks the structure to extract the names to pass to tk menu calls. For internationalization, the gettext.gettext translation call could be added in one place, where the string is passed to tk, rather than 80 places in the structure definition. An altered version of the menudefs walker could be used to collect the menu strings for translation. If we want to encourage multi-language tkinter apps, i18n code should be added somewhere public in the tkinter package (and gettext module), rather than hidden away in idlelib. -- Terry Jan Reedy

On Aug 13, 2015, at 10:04 AM, Terry Reedy wrote:
That would require being able to translate non-literals. I'd need the same, and it would be okay if the translation call were spelled less conveniently, as long as it's possible to both extract and translate the source strings. Cheers, -Barry

On 8/13/2015 11:39 AM, Barry Warsaw wrote:
I don't understand, Idle's menus are built from string literals -- no variables, not interpolation -- like 'File', 'Open', 'Open Module', etc. I think this is fairly typical.
With table-driven ui creation, extraction for human translators and replacement of the original by the translation can be done with a pair of related functions. With code-driven ui creation (as currently with Idle dialogs), an extraction function may be possible (if the string literals are tagged with keywords such as 'title=' or 'text=') but translation still requires addition of a _() call for each arguments that needs translation. -- Terry Jan Reedy

On 08/13/2015 09:58 PM, Terry Reedy wrote:
It's the "could be added in one place" part that would require working on non-literals. In that one place, you'd be operating on a variable, not a literal. Eric.

On 10.08.2015 22:31, Mike Miller wrote:
IMO, having just one string literal interpolation standard is better than having two and since i"" fits both needs, I'm +1 on i"" and -0 on f"". The only problem I see with i"" is that you may want to use formatting only in some cases, without triggering the translation machinery which may be active in a module. I guess it's fine to fallback to the standard .format() or %-approach for those few situations, though. In all other use cases, having the literal strings already prepared for translation in a Python module is a huge win: just drop a translation hook into the module and you're good to go :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 11 2015)
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 11.08.2015 15:10, Jonathan Slenders wrote:
-1 on any approach that uses a translation hook. Many frameworks have their own way of translating things. So that should definitely not be a global.
The module global approach is only one way to define a __interpolate__ function. As I understand the PEP, the compiler would simply translate the literal into a regular function call, which then is subject to the usual scoping rules in Python. It would therefore be possible to override the builtin in a local scope to e.g. address things like context or per-session based i18n. You could e.g. pass in a ${context} variable to the string, so that your __interpolate__ function can then directly access the required translation context. Alternatively, the __interpolate__ function could inspect the call stack to automatically find the needed context variable. I guess this particular use case could be made more elegant :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 11 2015)
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
participants (19)
-
Alexander Walters
-
Andrew Barnert
-
Barry Warsaw
-
Chris Angelico
-
Eric Fahlgren
-
Eric V. Smith
-
Guido van Rossum
-
Jonathan Slenders
-
M.-A. Lemburg
-
Mike Miller
-
MRAB
-
Ned Batchelder
-
Nick Coghlan
-
Nikolaus Rath
-
Petr Viktorin
-
Rustom Mody
-
Stephen J. Turnbull
-
Sven R. Kunze
-
Terry Reedy