[Python-Dev] PEP 460: allowing %d and %f and mojibake
v+python at g.nevcal.com
Mon Jan 13 21:44:06 CET 2014
On 1/13/2014 6:43 AM, Stephen J. Turnbull wrote:
> Glenn Linderman writes:
> > On 1/12/2014 4:08 PM, Stephen J. Turnbull wrote:
> >> Glenn Linderman writes:
> >>> the proposals to embed binary in Unicode by abusing Latin-1
> >>> encoding.
> >> Those aren't "proposals", they are currently feasible
> >> techniques in Python 3 for *some* use cases. The question is why
> >> infecting Python 3 with the byte/character confoundance virus is
> >> preferable to such techniques, especially if their (serious!)
> >> deficiencies are removed by creating a new type such as
> >> asciistr.
> > "smuggled binary" (great term borrowed from a different
> > subthread) muddies the waters of what you are dealing with.
> Not really. The "mud" is one or more of the serious deficiencies. It
> can be removed, I believe (and Nick apparently does, too). "asciistr"
> is one way to try that.
Yes really. Use of smuggled binary means the str containing it can no
longer be treated completely as a str. That is "muddier" than having a
str that is only a str.
> > When the mixture of text and binary is done as encoded text in
> > binary, then it is obvious that only limited text processing can be
> > performed,
> Hardly. After all, that's how all text processing was done for
> decades. Still is, in some programs, especially C programs.
I disagree, and so do you... text processing must be limited to the text
subsets of the text that includes smuggled binary... that is limited...
you can't just apply text searches, scans, and transformations over the
complete str, when it contains smuggled binary. You know that, but must
have not considered it a limitation, because you know you can do any
text processing on the text parts. But it is a limitation to have to
keep track of it, and apply the text processing only to the parts that
are text. Yes, it has been done that way, and the limitations of doing
it that way led to the plethora of encodings each of which was intended
to be sufficient for some problem domain, but most of which were only
sufficient for a smaller problem domain than intended, especially as
communications became more global in nature.
> > And there are no extra, confusing Latin-1 encode/decode operations
> > required.
> The "extra" encode/decode operations are mostly (perhaps all) due to
> examples that started from bytes and end with bytes. Of course if you
> assume that API and propose to do the operations using Unicode, you'll
> get "extra" decode/encode operations.
No, the "extra" encode/decode are from the requirement that smuggled
binary use latin-1, and other binary flavors are not always latin-1.
> > From a higher-level perspective, I think it would be great to have
> > a module, perhaps called "boundary" (let's call it that for now),
> > that allow some definition syntax (augmented BNF? augmented ABNF?)
> > to explain the format of a binary blob.
> We have struct, for one. I'm not sure why you want more than that. I
> suppose you could go all the way to ASN.1.
struct is insufficient to capture a whole file format, with optional
parts, although it suffices for fragments.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev