[Numpy-discussion] loadtxt stop

Benjamin Root ben.root at ou.edu
Fri Sep 17 15:59:11 EDT 2010


On Fri, Sep 17, 2010 at 2:50 PM, Zachary Pincus <zachary.pincus at yale.edu>wrote:

> > Though, really, it's annoying that numpy.loadtxt needs both the
> > readline function *and* the iterator protocol. If it just used
> > iterators, you could do:
> >
> > def truncator(fh, delimiter='END'):
> >   for line in fh:
> >     if line.strip() == delimiter:
> >       break
> >     yield line
> >
> > numpy.loadtxt(truncator(c))
> >
> > Maybe I'll try to work up a patch for this.
>
>
> That seemed easy... worth applying? Won't break compatibility, because
> the previous loadtxt required both fname.readline and fname.__iter__,
> while this requires only the latter.
>
>
> Index: numpy/lib/npyio.py
> ===================================================================
> --- numpy/lib/npyio.py  (revision 8716)
> +++ numpy/lib/npyio.py  (working copy)
> @@ -597,10 +597,11 @@
>              fh = bz2.BZ2File(fname)
>          else:
>              fh = open(fname, 'U')
> -    elif hasattr(fname, 'readline'):
> -        fh = fname
>      else:
> -        raise ValueError('fname must be a string or file handle')
> +      try:
> +          fh = iter(fname)
> +      except:
> +          raise ValueError('fname must be a string or file handle')
>      X = []
>
>      def flatten_dtype(dt):
> @@ -633,14 +634,18 @@
>
>          # Skip the first `skiprows` lines
>          for i in xrange(skiprows):
> -            fh.readline()
> +            try:
> +                fh.next()
> +            except StopIteration:
> +                raise IOError('End-of-file reached before
> encountering data.')
>
>          # Read until we find a line with some values, and use
>          # it to estimate the number of columns, N.
>          first_vals = None
>          while not first_vals:
> -            first_line = fh.readline()
> -            if not first_line: # EOF reached
> +            try:
> +                first_line = fh.next()
> +            except StopIteration:
>                  raise IOError('End-of-file reached before
> encountering data.')
>              first_vals = split_line(first_line)
>          N = len(usecols or first_vals)
>
>
So, this code will still raise an error for an empty file.  Personally, I
consider that a bug because I would expect to receive an empty array.  I
could understand raising an error for a non-empty file that does not contain
anything useful.  For comparison, Matlab returns an empty matrix for loading
an emtpy text file.

This has been a long-standing annoyance for me, along with the behavior with
a single-line data file.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100917/fb447958/attachment.html>


More information about the NumPy-Discussion mailing list