16bit hash

Thomas Jollans thomas at jollans.com
Thu Jun 28 14:53:13 CEST 2007


Robin Becker wrote:
> Martin v. Löwis wrote:
> 
> 0 the ideal hash
> 
> :)
> 
> can't be argued with
> 
>> .......
>> So: what are your input data, and what is the
>> distribution among them?
>>
>> Regards,
>> Martin
>>
> I'm trying to create UniqueID's for dynamic postscript fonts. According
> to my resources we don't actually need to use these, but if they are
> required by a particular postscript program (perhaps to make a print run
> efficient) then the private range of these ID's is 4000000<=UID<=4999999
> ie a range of one million.
> 
> So I probably really need an 18 bit hash
> 
> The data going into the font consists of
> 
> fontBBox '[-415 -431 2014 2033]'
> charmaps ['dup (\000) 0 get /C0 put',......]
> metrics ['/C0 1251 def',.....]
> bboxes ['/C29 [0 0 512 0] def',.......]
> chardefs ['/C0 {newpath 224 418 m 234 336 ......def}',......]
> 
> ie a bunch of lists of strings which are eventually joined together and
> written out with a template to make the postscript definition.
> 
> The UniqueID is used by PS interpreters to avoid recreating particular
> glyphs so ideally I would number these fonts sequentially using a global
> count, but in practice several processes separated by application and
> time can produce postscript which eventually gets merged back together.
> 
> If the UID's clash then the printer produces very strange output.
> 
> I'm fairly sure there's no obvious python way to ensure the separated
> processes can communicate except via the printer. So either I use a
> python based scheme which reduces the risk of clashes ie random or some
> data based hash scheme or I attempt to produce a postscript solution
> like looking for a private global sequence number.
> 
> I'm not sure my postscript is really good enough to do the latter so I
> hoped to pursue a python based approach which has a low probability of
> busting. Originally I thought the range was a 16bit number which is why
> I started with 16bit hashes.


For identifying something, I suggest you use a hash function like sha1
truncating it to as much as you can use, similarly to what Jon Ribbens
suggested.



More information about the Python-list mailing list