Generating valid identifiers
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Thu Jul 26 11:30:09 EDT 2012
On Thu, 26 Jul 2012 14:26:16 +0200, Laszlo Nagy wrote:
> I do not want this program to generate very long identifiers. It would
> increase SQL parsing time,
Will that increase in SQL parsing time be more, or less, than the time it
takes to generate CRC32 or SHA hashsums and append them to a truncated
identifier?
> * Would it be a problem to use CRC32 instead of SHA? (Since security is
> not a problem, and CRC32 is faster.)
What happens if you get a collision?
That is, you have two different long identifiers:
a.b.c.d...something
a.b.c.d...anotherthing
which by bad luck both hash to the same value:
a.b.c.d.$AABB99
a.b.c.d.$AABB99
(or whatever).
> * I'm truncating the digest value to 10 characters. Is it safe enough?
> I don't want to use more than 10 characters, because then it wouldn't be
> possible to recognize the original name.
> * Can somebody think of a
> better algorithm, that would give a bigger chance of recognizing the
> original identifier from the modified one?
Rather than truncating the most significant part of the identifier, the
field name, you should truncate the least important part, the middle.
a.b.c.d.e.f.g.something
goes to:
a.b...g.something
or similar.
--
Steven
More information about the Python-list
mailing list