Problem in accessing files with unicode fonts.

Mark Tolonen metolone+gmane at gmail.com
Tue Feb 24 02:56:54 EST 2009


Try os.walk for recursively walking directories.  Also if you use a unicode parameter with os.walk or os.listdir you get unicode strings in the result.  To run this successfully when you have non-ascii characters in your filenames, you will need to use an environment that supports the characters you want to print.  The Windows console, for example, typically only supports a native encoding such as cp437 or cp1252 (on U.S. systems).  If all else fails write the output to a file in your favorite encoding:

# coding: gbk
import os
import codecs
output = codecs.open(u'出口.txt','wt',encoding='utf-8')
for path,dirs,files in os.walk(u'.'):
    for fname in files:
        output.write(os.path.join(path,fname)+'\n')
output.close()


-Mark

  "venu madhav" <venutaurus539 at gmail.com> wrote in message news:daf1e02e0902232315j7e66593ewecfdd739972ad676 at mail.gmail.com...



  On Tue, Feb 24, 2009 at 12:16 PM, Chris Rebert <clp2 at rebertia.com> wrote:

    A. Your reason for emailing us off-list makes no sense. The list would
    garner you more and about as quick responses, not to mention the value
    it adds through public archiving. CC-ing us /might/ have made slight
    sense.
    B. This is your problem:

       v = unicode(full_path,errors='skip')

    I'd advise you to read the docs for `unicode`, particularly what using
    'skip' as the value of `errors` does.

    Good day.

    - Chris

    --
    Follow the path of the Iguana...
    http://rebertia.com



    On Mon, Feb 23, 2009 at 10:31 PM, venu madhav <venutaurus539 at gmail.com> wrote:
    > Hello,
    >            Sorry for mailing to your personal mails instead of mailing to
    > the group. The reason being the intensity of the problem and time factor.
    >
    > Prob: I have a folder which contains files with unicode names ( Arabic,
    > German etc). I am trying to obtain the attributes of those files recursively
    > using win32api.getFileAttributes() function. Here is the code which i have
    > for the same:
    > ----------------------------------------------------------------------
    > #!/usr/bin/env python
    > import os
    > import os.path
    > import sys
    > import urllib
    > import win32api,string
    > def findFile(dir_path):
    >     for name in os.listdir(dir_path):
    >         full_path = os.path.join(dir_path, name)
    >         print full_path
    >         if os.path.isdir(full_path):
    >             findFile(full_path)
    >         else:
    >             v = unicode(full_path,errors='skip')
    >             i = win32api.GetFileAttributes(v)
    >
    > findFile("F:\\DataSet\\Unicode")
    > ---------------------------------------------------------------
    > Now when I run this scirpt, the full_path variable which should contain the
    > name of the file has "????" for non english ( I tried Arabic) characters. As
    > a result the getfileattributes function is failing to recognise that file
    > and is raising an exception.
    > full_path: F:\DataSet\Unicode\Arabic files\Arabic??????????????????????????
    > {type is str}
    > v: F:\DataSet\Unicode\Arabic files\Arabic?????????????????????????? {type is
    > unicode}
    > TraceBack:
    > Traceback (most recent call last):
    >   File "E:\venu\Testing Team\unitest.py", line 19, in <module>
    >     findFile("F:\\DataSet\\Unicode")
    >   File "E:\venu\Testing Team\unitest.py", line 13, in findFile
    >     findFile(full_path)
    >   File "E:\venu\Testing Team\unitest.py", line 16, in findFile
    >     i = win32api.GetFileAttributes(v)
    > error: (123, 'GetFileAttributes', 'The filename, directory name, or volume
    > label syntax is incorrect.')
    >
    > Waiting for your reply.
    > Thank you in advance.
    > Venu.



  Hello,
          As you've said I went through the documentation for unicode function.It takes the file name as the first argument and then based on the second (whether you skip, ignore or strict) does the action. Any of those ways is not solving my problem. Since, using skip or ignore removes the unicode characters from the file name, the win32 function is failing to find that filename. I tried to change the default encoding to unicode in site.py scirpt but didn't give any results.




  Thank you,
  Venu


------------------------------------------------------------------------------


  --
  http://mail.python.org/mailman/listinfo/python-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090223/79211bf4/attachment.html>


More information about the Python-list mailing list