[Tutor] rounding up to the nearest multiple of 8

Albert-Jan Roskam fomcl at yahoo.com
Fri Oct 5 21:19:58 CEST 2012



<snip>


> This is all fun, but what about the context?  Your original function
> took an integer, not a string, and thus wasn't charged with measuring
> string length, possibly multiple times.  Even so, each of these tests is
> taking around a microsecond.  So are you expecting to do anything with
> the result?  Just calling ljust() method more than doubled the time.  If
> you actually have some code to generate the string, and/or if you're
> going to take the result and write it to a file, then pretty soon this
> function is negligible time.
> 
> If it were my code, i think I'd use something like   "      
> "[-sz%8:] 
> and either prepend or append that to my string.  But if I had to do
> something more complex, I'd tune it to the way the string were being used.
> 

Yeah, maybe I got a little carried away. ;-) Knuth's 'premature optimization is the root of all evil' comes to mind. Then
again, it was fun from an educational point of view.

The context: the code is part of a program that writes spss system files (binary, .sav). These may be anything from 
a few hundred till millions of records. Spss knows to types of data: character and numerical. The code is only relevant
for char data. If a 5M record dataset has 8 char variables, it means 40M executions of the posted code snippet. Or around 40 seconds devoted to padding. But in general there are fewer values. I'll use cProfile later to see if there are more urgent pain spots. This seemed a good candidate as the function is used so often. FWIW, here is the function, plus some of the init:

def __init__(self, *args, *kwargs):
        self.strRange = range(1, MAXLENGTHS['SPSS_MAX_LONGSTRING'][0] + 1)
        self.pad_8_lookup = dict([(i, -8 * (i // -8)) for i in self.strRange])

    def writerow(self, record):
        """ This function writes one record, which is a Python list."""
        convertedRecord = []
        for value, varName in zip(record, self.varNames):
            charlen = self.varNamesTypes[varName]
            if charlen > 0:
                value = value.ljust( self.pad_8_lookup[charlen] )
            else:
                try:
                    value = float(value)
                except ValueError:
                    value = self.sysmis
            convertedRecord.append(value)
        caseOut = struct.pack(self.structFmt, *convertedRecord)
        retcode = self.spssio.spssWholeCaseOut(self.fh, caseOut)
        if retcodes.get(retcode) != "SPSS_OK":
            raise SPSSIOError("Problem writing row:\n" % " ".join(record), retcode)
        return retcode


More information about the Tutor mailing list