Totally confused by the str/bytes/unicode differences introduced in Pythyon 3.x

John Machin sjmachin at lexicon.net
Sun Jan 18 02:37:40 CET 2009


On Jan 18, 9:10 am, Terry Reedy <tjre... at udel.edu> wrote:
> Martin v. Löwis wrote:
> >>> Does he intend to maintain two separate codebases, one 2.x and the
> >>> other 3.x?
> >> I think I have no other choice.
> >> Why? Is theoretically possible to maintain an unique code base for
> >> both 2.x and 3.x?
>
> > That is certainly possible! One might have to make tradeoffs wrt.
> > readability sometimes, but I found that this approach works quite
> > well for Django. I think Mark Hammond is also working on maintaining
> > a single code base for both 2.x and 3.x, for PythonWin.
>
> Where 'single codebase' means that the code runs as is in 2.x and as
> autoconverted by 2to3 (or possibly a custom comverter) in 3.x.
>
> One barrier to doing this is when the 2.x code has a mix of string
> literals with some being character strings that should not have 'b'
> prepended and some being true byte strings that should have 'b'
> prepended.  (Many programs do not have such a mix.)
>
> One approach to dealing with string constants I have not yet seen
> discussed here is to put them all in separate file(s) to be imported.
> Group the text and bytes separately.  Them marking the bytes with a 'b',
> either by hand or program would be easy.

(1) How would this work for somebody who wanted/needed to support 2.5
and earlier?

(2) Assuming supporting only 2.6 and 3.x:

Suppose you have this line:
if binary_data[:4] == "PK\x03\x04": # signature of ZIP file

Plan A:
Change original to:
if binary_data[:4] == ZIPFILE_SIG: # "PK\x03\x04"
Add this to the bytes section of the separate file:
ZIPFILE_SIG = "PK\x03\x04"
[somewhat later]
Change the above to:
ZIPFILE_SIG = b"PK\x03\x04"
[once per original file]
Add near the top:
from separatefile import *

Plan B:
Change original to:
if binary_data[:4] == ZIPFILE_SIG: # "PK\x03\x04"
Add this to the separate file:
ZIPFILE_SIG = b"PK\x03\x04"
[once per original file]
Add near the top:
from separatefile import *

Plan C:
Change original to:
if binary_data[:4] == b"PK\3\4": # signature of ZIP file

Unless I'm gravely mistaken, you seem to be suggesting Plan A or some
variety thereof -- what advantages do you see in this over Plan C?



More information about the Python-list mailing list