[Python-Dev] Unicode strings as filenames
Martin v. Loewis
martin@v.loewis.de
Tue, 8 Jan 2002 00:17:14 +0100
> posixmodule is just a library with calls and no state. IIRC there used to
> be multiple modules, one per OS, and the correct one was chosen and called
> os. I think it is perfectly reasonable for there to be an extra 'ntos'
> module that just works on NT that treats all arguments as Unicode (coercing
> up using the current locale when given narrow strings) and always calling
> the wide APIs. It would contain the same methods (when available) as os.
I'd be all in favour of bringing ntmodule back into life, especially
if that is to become a module that does not need to work on
Win9x. Perhaps it can be compiled twice, once into w9x.pyd and once
into nt.pyd, or the common code can be shared by means if #include.
I'd also be in favour of killing all 16-bit Windows support in Python
for 2.3; not sure whether 16-bit DOS needs to stay.
> If this is done then the unicode name should also be available as a field
> of the object as those mangled "z??.html" strings are totally useless.
It is not totally useless. Most users will never see the problem,
because their file names represent well in mbcs. In cases where you do
get replacement characters, it is still useful, since may roughly
recognize what file it is in debugging output (e.g. the file extension
will be ASCII-representable in most applicatons, perhaps you get a
meaningful path in there also).
> I'm feeling more like making f_name be wide now but I'd expect some
> opposition now from backwards compatibility advocates.
I think the major problem is that performing repr on a file should
work. If that turns out to use the repr of the string (can't check
right now), instead of raising UnicodeErrors, my oposition to putting
Unicode objects into file names is not that strong anymore.
> Yes, I'm thinking ahead of the coding. Seeing where I'm already
> going or about to go wrong.
That looks very good indeed. I was worried about using UTF-8 as file
system default encoding, because I believe that this encoding should
mandated by the system API, instead of being our choice.
Regards,
Martin