Problem with -3 switch

Mon Jan 12 03:29:17 EST 2009

On Jan 12, 12:32 am, John Machin <sjmac... at lexicon.net> wrote:
> On Jan 12, 12:23 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
>
>
>
> > On Jan 9, 6:11 pm, John Machin <sjmac... at lexicon.net> wrote:
>
> > > On Jan 10, 6:58 am, Carl Banks <pavlovevide... at gmail.com> wrote:
>
> > > > On Jan 9, 12:36 pm, "J. Cliff Dyer" <j... at sdf.lonestar.org> wrote:
>
> > > > > On Fri, 2009-01-09 at 13:13 -0500, Steve Holden wrote:
> > > > > > Aivar Annamaa wrote:
> > > > > > >> As was recently pointed out in a nearly identical thread, the -3
> > > > > > >> switch only points out problems that the 2to3 converter tool can't
> > > > > > >> automatically fix. Changing print to print() on the other hand is
> > > > > > >> easily fixed by 2to3.
>
> > > > > > >> Cheers,
> > > > > > >> Chris
>
> > > > > > > I see.
> > > > > > > So i gotta keep my own discipline with print() then :)
>
> > > > > > Only if you don't want to run your 2.x code through 2to3 before you use
> > > > > > it as Python 3.x code.
>
> > > > > > regards
> > > > > >  Steve
>
> > > > > And mind you, if you follow that route, you are programming in a
> > > > > mightily crippled language.
>
> > > > How do you figure?
>
> > > > I expect that it'd be a PITA in some cases to use the transitional
> > > > dialect (like getting all your Us in place), but that doesn't mean the
> > > > language is crippled.
>
> > > What is this "transitional dialect"? What does "getting all your Us in
> > > place" mean?
>
> > Transitional dialect is the subset of Python 2.6 that can be
> > translated to Python3 with 2to3 tool.
>
> I'd never seen it called "transitional dialect" before.

I had hoped the context would make it clear what I was talking about.

> >  Getting all your Us in place
> > refers to prepending a u to strings to make them unicode objects,
> > which is something 2to3 users are highly advised to do to keep hassles
> > to a minimum.  (Getting Bs in place would be a good idea too.)
>
> Ummm ... I'm not understanding something. 2to3 changes u"foo" to
> "foo", doesn't it? What's the point of going through the code and
> changing all non-binary "foo" to u"foo" only so that 2to3 can rip the
> u off again?

It does a bit more than that.

> What hassles? Who's doing the highly-advising where and
> with what supporting argument?

You add the u so the the constant will be the same data type in 2.6 as
it becomes in 3.0 after applying 2to3.  str and unicode objects aren't
always with smooth with each other, and you have a much better chance
of getting the same behavior in 2.6 and 3.0 if you use an actual
unicode string in both.

A example of this, though not with string constants, was posted here
recently.  Someone found that urllib.open() returns a bytes object in
Python 3.0, which messed him up since in 2.x he was running regexp
searches on the output.  If he had been taking care to use only
unicode objects in 2.x (in this case, by explicitly decoding the
output) then it wouldn't have been an issue.

> "Getting Bs into place" is necessary eventually. Whether it is
> worthwhile trying to find these in advance, or waiting for them to be
> picked up at testing time is a bit of a toss-up.
>
> Let's look at this hypothetical but fairly realistic piece of 2.x
> code:
> OLE2_SIGNATURE = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
> def is_ole2_file(filepath):
>      return open(filepath, "rb").read(8) == OLE2_SIGNATURE
>
> This is already syntactically valid 3.x code, and won't be changed by
> 2to3, but it won't work in 3.x because b"x" != "x" for all x. In this
> case, the cause of test failures should be readily apparent; in other
> cases the unexpected exception or test failure may happen at some
> distance.
>
> The 3.x version needs to have the effect of:
> OLE2_SIGNATURE = b"\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
> def is_ole2_file(filepath):
>      return open(filepath, "rb").read(8) == OLE2_SIGNATURE
>
> So in my regional variation of the transitional dialect, this becomes:
> from timemachine import *
> OLE2_SIGNATURE = BYTES_LITERAL("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1")
> def is_ole2_file(filepath):
>      return open(filepath, "rb").read(8) == OLE2_SIGNATURE
> # NOTE: don't change "rb"
> ...
> and timemachine.py contains (amongst other things):
> import sys
> python_version = sys.version_info[:2] # e.g. version 2.4 -> (2, 4)
> if python_version >= (3, 0):
>     BYTES_LITERAL = lambda x: x.encode('latin1')
> else:
>     BYTES_LITERAL = lambda x: x
>
> It is probably worthwhile taking an up-front inventory of all file open
> () calls and [c]StringIO.StringIO() calls -- is the file being used as
> a text file or a binary file?
> If a text file, check that any default encoding is appropriate.
> If a binary file, ensure there's a "b" in the mode (real file) or you
> supply (in 3.X) an io.BytesIO() instance, not an io.StringIO()
> instance.

Right.  "Taking care of the Us" refered specifically to the act of
prepending Us to string constants, but figuratively it means making
explicit your intentions with all string data.  2to3 can only do so
much; it can't always guess whether your string usage is supposed to
be character or binary.

It's definitely going to be the hardest part of the transition since
it's the most drastic change.

Carl Banks