Unicode File Names

Jordan jordan.taylor2 at gmail.com
Fri Oct 17 04:47:37 CEST 2008


On Oct 16, 10:18 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Oct 17, 12:52 pm, Jordan <jordan.tayl... at gmail.com> wrote:
>
>
>
> > On Oct 16, 9:20 pm, John Machin <sjmac... at lexicon.net> wrote:
>
> > > On Oct 17, 11:43 am, Jordan <jordan.tayl... at gmail.com> wrote:
>
> > > > I've got a bunch of files with Japanese characters in their names and
> > > > os.listdir() replaces those characters with ?'s. I'm trying to open
> > > > the files several steps later, and obviously Python isn't going to
> > > > find '01-????.jpg' (formally '01-ひらがな.jpg') because it doesn't exist.
> > > > I'm not sure where in the process I'm able to stop that from
> > > > happening. Thanks.
>
> > > The Fine Manual says:
> > > """
> > > listdir( path)
>
> > > Return a list containing the names of the entries in the directory.
> > > The list is in arbitrary order. It does not include the special
> > > entries '.' and '..' even if they are present in the directory.
> > > Availability: Macintosh, Unix, Windows.
> > > Changed in version 2.3: On Windows NT/2k/XP and Unix, if path is a
> > > Unicode object, the result will be a list of Unicode objects.
> > > """
>
> > > Are you unsure whether your version of Python is 2.3 or later?
>
> > *** Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32
> > bit (Intel)] on win32. *** says my interpreter
>
> > when it says "if path is a Unicode object...", does that mean the path
> > name must have a Unicode char?
>
> If path is a Unicode [should read unicode] object of length > 0, then
> *all* characters in path are by definition unicode characters.
>
> Where are you getting your path from? If you are doing os.listdir(r'c:
> \test') then do os.listdir(ur'c:\test'). If you are getting it from
> the command line or somehow else as a variable, instead of
> os.listdir(path), try os.listdir(unicode(path)). If that fails with a
> message like "UnicodeDecodeError: 'ascii' codec can't decode .....",
> then you'll need something like os.listdir(unicode(path,
> encoding='cp1252')) # cp1252 being the most likely suspect :)
>
> I strongly suggest that you read this:
>    http://www.amk.ca/python/howto/unicode
> which contains lots of useful information, including an answer to your
> original question.

The problem's been solved (thanks Chris and John). I was getting the
path from command line, and didn't realize using unicode(path) would
make the list Unicode as well. Thanks for the help.



More information about the Python-list mailing list