[Python-3000] Example implementation for string.format

Talin talin at acm.org
Mon Apr 24 04:04:29 CEST 2006


There have been a number of interesting suggestions as to whether
string.format should support pipelined conversion specifiers, nested
conversion specifiers, and so forth.

I'm going to follow in Guido's lead at this point, and say that perhaps
these kinds of decisions should be made after looking at a sample
implementation. At the same time, I want it make it as easy as possible,
so I'm going to post here a sample implementation to use as a starting
point.

Now, I'm not actually going to post a patch that adds a "format" method
to the built-in string type. Instead, I am going to post a function that
has the behavior that I am looking for.

It's not the greatest Python code in the world, but that's not its purpose. I
hacked this up over the course of about an hour, so its probably got a bug or
two.

In a real implementation, both the string.format and the MyFormatter.format
functions would call this underlying 'engine' to do the work of parsing the
field names and specifiers.

Note: I decided to scan the string character by character rather than using
regular expressions because of (a) the recursive nesting of braces, and (b)
because something like this may go into the interpreter, and we don't want to
add a dependency on re.

Anyway, if you have an idea as to how things should behave differently - feel
free to hack this, play with it, test out your idea, and then describe what
you did.

--- Talin

----------------------------------------------------------------------------

# Python string formatting

# Except for errors in the format string.
class FormatError(StandardError):
    pass

def format(template, format_hook, *args, **kwargs):
    # Using array types since we're going to be growing
    # a lot.
    from array import array
    array_type = 'c'
    
    # Use unicode array if the original string is unicode.
    if isinstance(template, unicode): array_type = 'u'
    buffer = array(array_type)
    
    # Track which arguments actuallly got used
    unused_args = set(kwargs.keys())
    unused_args.update(range(0, len(args)))

    # Inner function to format a field from a value and
    # conversion spec. Most details missing.
    def format_field(value, cspec, buffer):

        # See if there's a hook
        if format_hook and format_hook(value, cspec, buffer):
            return
            
        # See if there's a __format__ method
        elif hasattr(value, '__format__'):
            buffer.extend(value.__format__(cspec))
            
        # Example built-in for ints. Probably should be
        # table driven by type, but oh well.
        elif isinstance(value, int):
            if cspec == 'x':
                buffer.extend(hex(value))
            else:
                buffer.extend(str(value))
                
        # Default to just 'str'
        else:
            buffer.extend(str(value))
    
    # Parse a field specification.
    def parse_field(iterator, buffer):
        
        # A separate array for the field name.
        name = array(array_type)

        # Consume from the same iterator.
        for ch in iterator:
            # A sub-field. We just interpret it
            # like a normal field, and append to
            # the name.
            if ch == '{':
                parse_field(iterator, name)
                
            # End of field. Time to process
            elif ch == '}':
                # Convert the array to string or uni
                if array_type == 'u': name = name.tosunicode()
                else: name = name.tostring()
                    
                # Check for conversion spec
                parts = name.split(':', 1)
                conversion = 's'
                if len(parts) > 1:
                    name, conversion = parts
                    
                # Try to retrieve the field value
                try:
                    key = int(name)
                    value = args[key]
                except ValueError:
                    # Keyword args are strings, not uni (so far)
                    key = str(name)
                    value = kwargs[name]

                # If we got no exception, then remove from
                # unused args
                unused_args.remove(key)
                
                # Format it
                format_field(value, conversion, buffer)
                return
            elif ch == '\\':
                # Escape
                try:
                    name.append(template_iter.next())
                except StopIteration:
                    # Backslash at end of string is bad
                    raise FormatError("unmatched open brace")
            else:
                name.append(ch)
                
        raise FormatError("unmatched open brace")

    # Construct an iterator from the template
    template_iter = iter(template)
    for ch in template_iter:
        # It's a field! Yay!
        if ch == '{':
            parse_field(template_iter, buffer)
        elif ch == '}':
            # Unmatch brace
            raise FormatError("unmatched close brace")
        elif ch == '\\':
            # More escapism
            try:
                buffer.append(template_iter.next())
            except StopIteration:
                # Backslash at end of string is OK here
                buffer.append(ch)
                break
        else:
            buffer.append(ch)
            
    # Complain about unused args
    if unused_args:
        raise FormatError(
            "Unused arguments: "
            + ",".join(str(x) for x in unused_args))
        
    # Convert the array to its proper type
    if isinstance(template, unicode):
        return buffer.tounicode()
    else:
        return buffer.tostring()
    
print format("This is a test of {0:x} {x} {1}\{",
    None, 1000, 20, x='hex');




More information about the Python-3000 mailing list