generic text read function

Fri Mar 18 01:11:34 EST 2005

John Hunter wrote:
>>>>>>"les" == les ander <les_ander at yahoo.com> writes:
> 
> 
>     les> Hi, matlab has a useful function called "textread" which I am
>     les> trying to reproduce in python.
> 
>     les> two inputs: filename, format (%s for string, %d for integers,
>     les> etc and arbitary delimiters)
> 
Builing on John's solution, this is still not quite what you're looking for (the 
delimiter preference is set for the whole line as a separate argument), but it's 
one step closer, and may give you some ideas:

import re

dispatcher = {'%s' : str,
               '%d' : int,
               '%f' : float,
                 }
parser = re.compile("|".join(dispatcher))

def textread(iterable, formats, delimiter = None):

     # Splits on any combination of one or more chars in delimeter
     # or whitespace by default
     splitter = re.compile("[%s]+" % (delimiter or r"\s"))

     # Parse the format string into a list of converters
     # Note that white space in the format string is ignored
     # unlike the spec which calls for significant delimiters
     try:
         converters = [dispatcher[format] for format in parser.findall(formats)]
     except KeyError, err:
         raise KeyError, "Unrecogized format: %s" % err

     format_length = len(converters)

     iterator = iter(iterable)

     # Use any line-based iterable - like file
     for line in iterator:
         cols = re.split(splitter, line)
         if len(cols) != format_length:
             raise ValueError, "Illegal line: %s" % cols
         yield [func(val) for func, val in zip(converters, cols)]

# Example Usage:

source1 = """Item  5  8.0
Item2 6 9.0"""

source2 = """Item 1 \t42
Item 2\t43"""

  >>> for i in textread(source1.splitlines(),"%s %d %f"): print i
  ...
  ['Item', 5, 8.0]
  ['Item2', 6, 9.0]
  >>> for i in textread(source2.splitlines(),"%s %f", "\t"): print i
  ...
  ['Item 1 ', 42.0]
  ['Item 2', 43.0]
  >>> for item, value in textread(source2.splitlines(),"%s %f", "\t"): print 
item, value
  ...
  Item 1  42.0
  Item 2 43.0
  >>>