[Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames

noreply@sourceforge.net noreply@sourceforge.net
Fri, 27 Apr 2001 05:15:18 -0700


Patches item #410465, was updated on 2001-03-21 21:02
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465&group_id=5470

Category: core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Hammond (mhammond)
Assigned to: Mark Hammond (mhammond)
Summary: Allow pre-encoded strings as filenames

Initial Comment:
This patch enables most filename parameters to use pre-
encoded strings.  On Windows, the default of "mbcs" is 
used.  On all other platforms, the default filename 
encoding is the same as the general default encoding, 
which in reality means there is no functional change.  
However, other platforms can simply plugin their own 
encodings.

Rationalle: os.listdir() etc already return pre-
encoded strings on some platforms (notably Windows).  
These pre-encoded strings may be used now for all 
these functions - however, if you convert this encoded 
string to a Unicode string, it can not be used to open 
the file.  This patch enables either a pre-encoded 
string to work (as now) or a Unicode representation of 
that same string (unlike now)

Things of note:
* I invented a new "Es" PyArg_ParseTuple marker.  This 
is very similar to "es", except it leaves string 
objects alone assuming they are already encoded 
correctly.  "es" assumes a string in the default 
encoding which it will then encode in the new 
characterset - ie, a pre-encoded string fails here.

* This means that all affected functions have an extra 
string copy.  This copy still happens even when 
strings are passed, and even on platforms where no 
Unicode filesystem support exists.  The only other 
alternative was to make a much uglier patch, somehow 
using string objects in-place, but converting and 
freeing the buffer when Unicode.  This could be done 
if desired, but I'm not sure the added code complexity 
is worth it.

* New method on win32: nt._getpathname().  This is 
almost identical to win32api.GetPathName(), except it 
handles encoded strings.  ntpath.py has also been 
changed to work with this.  A hidden bonus of this 
patch is that it will make os.abspath() work 
identically regardless of the Win32 extensions being 
installed.

* Tested on Linux, Windows 98 and Windows 2k.  Still 
working out how to build Python on my BeOs box :)

* New test for these semantics added.


----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2001-04-27 05:15

Message:
Logged In: YES 
user_id=14198

MAL - please do!  I generally look for the least-intrusive 
patch when dealing with potentially contentious issues, but 
I agree it makes more sense to rationalize.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-04-27 00:54

Message:
Logged In: YES 
user_id=38388

I like the idea of telling the arg parser to accept strings
as-is, but I think that copying all the code just to
implement the new "E" parser. Much easier would be switching
on the second marker
(behind the "e"), e.g. using "et" and "et#".

Do you want me to look into this ?

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2001-03-22 14:10

Message:
Logged In: YES 
user_id=14198

I appreciate it is too late for 2.1 for a change of this 
size.

I don't think posixmodule is wrong - at least not how you 
think :)

posix_rename calls:
	return posix_2str(args, "EsEs:rename", rename);

however, it is posix_2str that passes the encoding, not 
posix_rename itself.  Ditto for posix_1str and 
posix_do_stat.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-03-22 13:45

Message:
Logged In: YES 
user_id=6380

Mark, I don't think you expected to get this into 2.1, did
you?  It's way too big.

Also, I think your patch to posixmodule.c has some bugs --
if I understand correctly, the format string "Es" requires
two arguments, the encoding and the address of the C string
pointer; but several functions (posix_rename and onwards)
don't pass the encoding name.

----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2001-03-21 21:04

Message:
Logged In: YES 
user_id=14198

doh - forgot to click the checkbox

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465&group_id=5470