Windows XP - Environment variable - Unicode

John Roth newsgroups at jhrothjr.com
Sun Jul 13 04:58:11 EDT 2003


"Martin v. Löwis" <martin at v.loewis.de> wrote in message
news:3F10795B.9000501 at v.loewis.de...
> John Roth wrote:
>
> > I don't think encoding is an issue. Windows XP stores all character data
as
> > unicode internally, so whatever you get back from os.environ() is either
> > going to be unicode, or it's going to be translated back to some single
byte
> > code by Python.
>
> Read the source, Luke.

I haven't gotten into the Python source, and my name is not Luke.
Also, don't respond to my e-mail address. Unfortunately, I had a problem
where I had to reload my system, and it's gotten out  to usenet. It used
to go to an ISP I no longer have an account with.

> Python uses environ, which is a C library
> variable pointing to byte strings, so no Unicode here.

The OP's question revolved around ***which*** code page was
being used internally. Windows uses Unicode. That's not the same
question as what code set Python uses to attempt to translate Unicode
into a single byte character set.

>  > In the latter case, you may not be able to recover non-ascii
> > values, so Rob Willscroft's workaround to get the unicode version may be
> > your only hope.
>
> You are certainly able to recover non-ascii values, as long as they
> only use CP_ACP.

I said "may not," not "cannot in any and all circumstances."

> > If you're getting a standard string though, I'd try using Latin-1, or
the
> > Windows equivalent first (it's got an additional 32 characters that
aren't in
> > Latin-1.)
>
> That, in general, is wrong. It is only true for the Western European and
> American editions of Windows. In all other installations, CP_ACP differs
> significantly from Latin-1.

The OP's problem was a character that's in the Western European range.

> > Note that Release 2.3 fixes the unicode problems for files under XP.
> > It's currently in late beta, though. I don't know if it fixes the
> > os.environ()
>
> It doesn't. "Fixing" something here is less urgent and more difficult,
> as environment variables rarely exceed CP_ACP.

Less urgent I can see, unless you're concerned about whether Python
survives against systems that do it right. Now that the Windows 9x
series is dying off, the vast majority of systems on the desktop are
going to have Unicode support internally. Granted, Python is not
targeted at "the vast majority of systems," but if you can't easily get
Unicode from the environment and the registry, then it's not very
useful for system administration tasks or automation tasks on
Windows.

Many, if not most, environment variables are file names. If file
names need Unicode support, then so do environment variables.

As to more difficult, as I said above, I haven't perused the source,
so I can't comment on that. If I had to do it myself, I'd probably
start out by always using the Unicode variant of the Windows API
call, and then check the type of the arguement to environ() to determine
which to pass back. I'm not sure whether or not I'd throw an exception
if the actual value couldn't be translated to the current SBCS code.

> If people get support for Unicode environment variables, they want
> Unicode command line arguments next.

Why not? I can enter a command with Unicode at the Windows
command prompt, and that command is likely to contain file names.
Same problem raising it's head in a different spot.

John Roth

On reading this over, it does sound a bit more strident than my
responses usually do, but I will admit to being irritated at the
assumption that you need to read the source to find out the
answer to various questions.

> Regards,
> Martin
>






More information about the Python-list mailing list