waling a directory with very many files

Nick Craig-Wood nick at craig-wood.com
Mon Jun 15 17:29:33 EDT 2009


Jean-Paul Calderone <exarkun at divmod.com> wrote:
>  On Mon, 15 Jun 2009 09:29:33 -0500, Nick Craig-Wood <nick at craig-wood.com> wrote:
> >Hrvoje Niksic <hniksic at xemacs.org> wrote:
> >>  Nick Craig-Wood <nick at craig-wood.com> writes:
> >>
> >> > Here is a ctypes generator listdir for unix-like OSes.
> >>
> >>  ctypes code scares me with its duplication of the contents of system
> >>  headers.  I understand its use as a proof of concept, or for hacks one
> >>  needs right now, but can anyone seriously propose using this kind of
> >>  code in a Python program?  For example, this seems much more
> >>  "Linux-only", or possibly even "32-bit-Linux-only", than
> >>  "unix-like":
> >
> >It was a proof of concept certainly..
> >
> >It can be done properly with gccxml though which converts structures
> >into ctypes definitions.
> >
> >That said the dirent struct is specified by POSIX so if you get the
> >correct types for all the individual members then it should be correct
> >everywhere.  Maybe ;-)
> 
>  The problem is that POSIX specifies the fields with types like off_t and
>  ino_t.  Since ctypes doesn't know anything about these types, application
>  code has to specify their size and other attributes.  As these vary from
>  platform to platform, you can't get it correct without asking a real C
>  compiler.

These types could be part of ctypes.  After all ctypes knows how big a
long is on all platforms, and it knows that a uint32_t is the same on
all platforms, it could conceivably know how big an off_t or an ino_t
is too.

>  In other words, POSIX talks about APIs and ctypes deals with ABIs.
> 
>  http://pypi.python.org/pypi/ctypes_configure/0.1 helps with the problem,
>  and is a bit more accessible than gccxml.

I haven't seen that before - looks interesting.

>  It is basically correct to say that using ctypes without using something
>  like gccxml or ctypes_configure will give you non-portable code.

Well it depends on if the API is specified in types that ctypes
understands.  Eg, short, int, long, int32_t, uint64_t etc.  A lot of
interfaces are specified exactly like that and work just fine with
ctypes in a portable way.  I agree with you that struct dirent
probably isn't one of those though!

I think it would be relatively easy to implent the code I demonstrated
in a portable way though...  I'd do it by defining dirent as a block
of memory and then for the first run, find a known filename in the
block, establishing the offset of the name field since that is all we
are interested in for the OPs problem.

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list