bdesth.quelquechose at free.quelquepart.fr
Mon Jun 5 14:30:06 CEST 2006
John Machin a écrit :
> On 5/06/2006 10:38 AM, Bruno Desthuilliers wrote:
>> SuperHik a écrit :
>>> hi all,
>>> I have an old(er) script with the
>>> following task - takes a string I copy-pasted and wich always has the
>>> same format:
>> def to_dict(items):
>> items = items.replace('\t', '\n').split('\n')
> In case there are leading/trailing spaces on the keys:
There aren't. Test passes.
> Fantastic -- at least for the OP's carefully copied-and-pasted input.
That was the spec, and my code passes the test.
> Meanwhile back in the real world,
The "real world" is mostly defined by customer's test set (is that the
correct translation for "jeu d'essai" ?). Code passes the test. period.
> there might be problems with multiple
> tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.
Which means that the spec and the customer's test set is wrong. Not my
responsability. Any way, I refuse to change anything in the parsing
algorithm before having another test set.
> In that case a loop approach that validated as it went and was able to
> report the position and contents of any invalid input might be better.
One doesn't know what *will* be better without actual facts. You can be
right (and, from my experience, you probably are !-), *but* you can be
wrong as well. Until you have a correct spec and test data set on which
the code fails, writing any other code is a waste of time. Better to
work on other parts of the system, and come back on this if and when the
Kind of reminds me of a former employer that paid me 2 full monthes to
work on a very hairy data migration script (the original data set was so
f... up and incoherent even a human parser could barely make any sens of
it), before discovering than none of the users of the old system was
interested in migrating that part of the data. Talk about a waste of
time and money...
Now FWIW, there's actually something else bugging me with this code : it
loads the whole data set in memory. It's ok for a few lines, but
obviously wrong if one is to parse huge files. *That* would be the first
thing I would change - it takes a couple of minutes to do so no real
waste of time, but it obviously imply rethinking the API, which is
better done yet than when client code will have been written.
My 2 cents....
More information about the Python-list