[python-win32] Try to got short path for files - but got error...

DurumDara durumdara at gmail.com
Mon May 29 13:01:31 CEST 2006


Mark Hammond írta:
> I'm afraid your message isn't very clear.  You should try and copy the
> smallest possible code that demonstrates your problem, exactly as Bobby did
> in his reply.
>   
Hi everybody !

The short path project is continue...
I answer in this email to all question !

First of all: I needed a utility that can create sha1/md5 checksums for
disk (every files, and totals).
I wrote this program in python, and compiled with py2exe.
The working method is based on os.listdir, and normal file operations.

But the customer (the main user) reported many bugs in this application.
All of them caused by non-english characters (unicode things, os and
code page dependent words).
We use this app. in Central Europe, so we got disks with many langs
(english, hungarian, czeh, slovenian, russian, etc.).
Every of the encodings are different.
Interesting: I found in my hun. os HDD a russian folder with russian
chars, because I installed a russian software formerly.

The os/os.path modules not working good with these filenames. In this
point I subst. them with win32file/api.
FindFilesW working good, and it is eating everything - special chars,
unicode range, etc.
My code working stable !

Wow ! I thought: this is victory !

But: I must reorganize my code, because the customer need better speed
for this app.
Ok. I thinking about profiling, and optimizing (now also).
I see: the most time usage is in sha/md5 calculator, and in disk operations.

With these exps. I downloaded FSUM program. Everything is good with this
- I see, that it's performance better than native py code.
But when I tested with unicode libraries, I got error.

I need to start this program with input paramteres. But this is "DOS
area", special characters are not working - I need to convert them.
One chance for this the MS specific short paths.
So I tried this with many method.

See this code. This is demonstrate my problem. I remark to better
understanding.

################################################################################
import os,sys,win32file,win32api,subprocess

# Making of special directory with unicode chars
dirname=u'LongDirxA\xff'
if not os.path.exists(dirname):
    os.makedirs(dirname)

# This is the special filename
UFN=u'%s\\%s\\LongFilexA\xff.txt'%(os.getcwd(),'%s')
UFN=UFN%(dirname)

# The file content writing to later test (reload with py)
content=str(range(40))
f=open(UFN,'w')
f.write(content)
f.close()

# Show info
print "Original filename"
print [UFN]

# Simple way - but this is not working. See later
def W32APIShortPathName(UFileName):
    return win32api.GetShortPathName(UFileName)

# Use Ctypes. Better, but...
def WinCTypesSPNW(UFileName):
    from ctypes import windll, create_unicode_buffer, sizeof, WinError
    buf=create_unicode_buffer(512)
    if windll.kernel32.GetShortPathNameW(UFileName,buf,sizeof(buf)):
        return buf.value
    else:
        raise Exception,'SPW convert error !'

# Special way. Use FindFilesW to convert to short path
def SplitAndFind(UFileName):
    Tags=[]
    drive,path=os.path.splitdrive(UFileName)
    drives=drive+'\\'
    while 1:
        if UFileName==drives:
            break
        filedatas=win32file.FindFilesW(UFileName)
        fd=filedatas[0]
        shorttag=fd[9] or fd[8]
        Tags.append(shorttag)
        split=os.path.split(UFileName)
        if len(split)<2:
            break
        UPath,UFile=split
        UFileName=UPath
    Tags.append(drive)
    Tags.reverse()
    return "\\".join(Tags)

# Test functions
_funcs=[W32APIShortPathName, WinCTypesSPNW, SplitAndFind]

# Converter
def ConvertFileNameToShortName(UFileName):
    r=[]
    # With every of these funcs.
    for func in _funcs:
        # try to convert, and get result/error
        fnstr=str(func)
        try:
            shortfn=func(UFileName)
            exc=None
        except Exception,v:
            shortfn=None
            exc=v
        r.append([func,shortfn,exc])
    return r

# The tester
def TestShortsAndLongs(UFileName):
    # Convert all of them
    shorts=ConvertFileNameToShortName(UFN)
    print "Try to use FSUM without shortfilename"
    try:
        # Try to forward params to FSUM
        spath,sfn=os.path.split(UFileName)
        cmd=['%s\\fsum.exe'%os.getcwd(),'-sha1','-D"%s"'%spath,sfn]
        #print cmd
        po=subprocess.Popen(cmd,stdout=subprocess.PIPE)
        output=po.communicate()[0]
        print "Output",output
    except Exception,v:
        print v
    print
    # Same thing, but with converted filenames
    print "Try to use FSUM with shortfilenames"
    for func,shortfn,exc in shorts:
        print "\n",func
        if shortfn:
            print shortfn
            spath,sfn=os.path.split(shortfn)
            cmd=['%s\\fsum.exe'%os.getcwd(),'-sha1','-D%s'%spath,sfn]
            #print cmd
            po=subprocess.Popen(cmd,stdout=subprocess.PIPE)
            output=po.communicate()[0]
            print "Output",output
            con=open(shortfn,'r').read()
            print con
        else:
            print "Error:",exc


TestShortsAndLongs(UFN)
################################################################################

That was the code. Simple. The result (sorry for hungarian windows
messages):
================================================================================

Commandline: C:\Python24\python.exe C:\SPEEDT~1\fsumunic\UNITOS~1.PY
Workingdirectory: C:\speedtest\fsumunic
Timeout: 0 ms


SlavaSoft Optimizing Checksum Utility - fsum 2.51
Implemented using SlavaSoft QuickHash Library <www.slavasoft.com>
Copyright (C) SlavaSoft Inc. 1999-2004. All rights reserved.


SlavaSoft Optimizing Checksum Utility - fsum 2.51
Implemented using SlavaSoft QuickHash Library <www.slavasoft.com>
Copyright (C) SlavaSoft Inc. 1999-2004. All rights reserved.

Original filename
[u'C:\\speedtest\\fsumunic\\LongDirxA\xff\\LongFilexA\xff.txt']
Try to use FSUM without shortfilename
'ascii' codec can't encode character u'\xff' in position 72: ordinal not
in range(128)

Try to use FSUM with shortfilenames

<function W32APIShortPathName at 0x00A321B0>
Error: 'ascii' codec can't encode character u'\xff' in position 31:
ordinal not in range(128)

<function WinCTypesSPNW at 0x00A32030>
C:\SPEEDT~1\fsumunic\LONGDI~1\LONGFI~1.TXT
Output ; SlavaSoft Optimizing Checksum Utility - fsum 2.51
<www.slavasoft.com>
;
; Generated on 05/26/06 at 19:00:12
;
NOT FOUND    *****        LongFilexAy.txt

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
38, 39]

<function SplitAndFind at 0x00A260F0>
C:\SPEEDT~1\fsumunic\LONGDI~1\LONGFI~1.TXT
Output ; SlavaSoft Optimizing Checksum Utility - fsum 2.51
<www.slavasoft.com>
;
; Generated on 05/26/06 at 19:00:12
;
NOT FOUND    *****        LongFilexAy.txt

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
38, 39]

Process "Pyhton Interpeter" terminated, ExitCode: 00000000
================================================================================

Then I try to see dir command's output:

================================================================================
c:\speedtest\fsumunic>dir LONGDI~1
A meghajtóban (C) lévő kötet Christ.
A kötet sorozatszáma: 2CC1-B5AE

c:\speedtest\fsumunic\LONGDI~1 tartalma:

2006.05.26.  18:30    <DIR>          .
2006.05.26.  18:30    <DIR>          ..
2006.05.26.  19:00               150 LongFilexAy.txt
               1 fájl               150 bájt
               2 könyvtár   1 013 125 120 bájt szabad

c:\speedtest\fsumunic>dir /x LONGDI~1\
A meghajtóban (C) lévő kötet Christ.
A kötet sorozatszáma: 2CC1-B5AE

c:\speedtest\fsumunic\LONGDI~1 tartalma:

2006.05.26.  18:30    <DIR>                       .
2006.05.26.  18:30    <DIR>                       ..
2006.05.26.  19:00               150 LONGFI~1.TXT LongFilexAy.txt
               1 fájl               150 bájt
               2 könyvtár   1 013 125 120 bájt szabad

c:\speedtest\fsumunic>
================================================================================

Ok, files are good. :-)

Try to make by hand:
================================================================================
c:\speedtest\fsumunic>C:\\speedtest\\fsumunic\\fsum.exe -sha1
-DC:\SPEEDT~1\fsum
unic\LONGDI~1 LONGFI~1.TXT

SlavaSoft Optimizing Checksum Utility - fsum 2.51
Implemented using SlavaSoft QuickHash Library <www.slavasoft.com>
Copyright (C) SlavaSoft Inc. 1999-2004. All rights reserved.

; SlavaSoft Optimizing Checksum Utility - fsum 2.51 <www.slavasoft.com>
;
; Generated on 05/26/06 at 19:08:41
;
NOT FOUND    *****        LongFilexAy.txt

c:\speedtest\fsumunic>
================================================================================

Try to make by hand without file spec -> all file:
================================================================================
c:\speedtest\fsumunic>fsum -sha1 -Dlongdi~1 *

SlavaSoft Optimizing Checksum Utility - fsum 2.51
Implemented using SlavaSoft QuickHash Library <www.slavasoft.com>
Copyright (C) SlavaSoft Inc. 1999-2004. All rights reserved.

; SlavaSoft Optimizing Checksum Utility - fsum 2.51 <www.slavasoft.com>
;
; Generated on 05/28/06 at 14:55:31
;
NOT FOUND    *****        LongFilexAy.txt
================================================================================

Test with secondary file to test working:
================================================================================

c:\speedtest\fsumunic>fsum -sha1 -Dlongdi~1 *

SlavaSoft Optimizing Checksum Utility - fsum 2.51
Implemented using SlavaSoft QuickHash Library <www.slavasoft.com>
Copyright (C) SlavaSoft Inc. 1999-2004. All rights reserved.

; SlavaSoft Optimizing Checksum Utility - fsum 2.51 <www.slavasoft.com>
;
; Generated on 05/28/06 at 14:56:32
;
NOT FOUND    *****        LongFilexAy.txt
4b964d5dd7113e0683c4c0c252f4dd26cea26506 ?SHA1*test.xxx

c:\speedtest\fsumunic>
================================================================================

So as you see, the FSUM cannot working with unicode and shortpath based
files.

That was the first story.

I answer here to your questions:

1.)
>One thing to bear in mind is that short path names
>won't always be available.  You can disable them
>on NTFS systems.

Thank you for this info. I don't want to use a function based on
non-stable (or disable) method...

2.)
>Another workaround to try if GetShortPathNameW() is truly not 
available would be to use "dir".  This is kludgy but will work if all
your machines are using WinXP or 2000 I believe.  For example, if I
create a test file with contents "Hello World"...
> import os, re
> dirname = "c:\\speedtest"
> longname = "This is Hello World.txt"
> return_vals = os.popen('dir /x "' + dirname + '\\' + longname + 
'"').read()
> shortname = re.findall("\s+(\S+)\s+" + longname, return_vals)
> shortname

This is not working with special chars (non-english chars).



I hope this message clearer than previous.
Ok, I found the error in FSUM, but the question is remaining in opened
state,
because every of the solutions are little complex, chaotic, or not
speeding up
my code.
The main question - the parameterizing of subprocess (wroted by another
developer)
with unicode filenames, and the short path "misery"...

Thanx for help, thanx for answers !

    dd





More information about the Python-win32 mailing list