Comments

Alex Martelli aleax at aleax.it
Wed May 8 18:27:10 EDT 2002


Sean 'Shaleh' Perry wrote:
        ...
> thanks for clearing that up Tim.  Not sure I like the idea of random
> strings being used for comments, but at least they are reasonably cheap.

Totally random strings would make very unlikeable comments -- how
can you be unsure about it?  Just imagine the mishmash of letters,
digits, punctuation -- suitable for a Perl script perhaps, but surely not
for a Python comment.

OTOH, random strings generated with some cleverness *might* make
for interesting comments, admittedly.  E.g.:

[alex at lancelot cb]$ python past.py
simple pixels update specific will png_set_compression_*() function into
and or computational data. combine & when used
either the bkgd (i.e. changes 8 were -
png_structp = documentation code the following keyword. which

Great when you have to code for a place whose styles specify mandatory
comments and some minimal amount of comment vs code, no?  Of course,
you'll pastiche from a sample text appropriate to the program's subject.

This of course may be generated from a simple program such as:

import random

def findNgram(haystack, needle):
    pos = 0
    n0 = needle[0]
    ln = len(needle)
    lh = len(haystack)
    while pos+ln <= lh:
        try: where = haystack[pos:].index(n0)
        except: return -1
        pos += where
        if haystack[pos:pos+ln]==needle: return where
        pos += 1

class pastiche:
    # any substantial file of English words will do just as well
    ifn='/home/alex/down/qt-x11-free-3.0.3/src/3rdparty/libpng/libpng.txt'
    data = open(ifn).read().lower().split()
    def renew(self, n, maxmem=3):
        self.result = []
        for i in range(n):
            # randomly 'rotate' self.data
            randspot = random.randrange(len(self.data))
            self.data = self.data[randspot:] + self.data[:randspot]
            where = -1
            # get the N-gram
            locate = ''.join(self.result[-maxmem:])
            while where<0 and locate:
                # locate the N-gram in the data
                where = findNgram(self.data, locate)
                # back off to a shorter N-gram if need be
                locate = locate[1:]
            c = self.data[where+len(locate)+1]
            self.result.append(c)
        return ' '.join(self.result)

p = pastiche()
for i in range(4):
    print p.renew(8)


If you write many programs on similar subjects it's much faster to prepare
and save the N-grams you require once and for all from a suitable corpus,
then pasticheing them can be much faster than this.


Alex




More information about the Python-list mailing list