[Q]:Generate Unique ID's

Paul Rubin http
Fri May 23 20:57:59 EDT 2003


Steven Taschuk <staschuk at telusplanet.net> writes:
> I am puzzled about this guarantee of eternal and universal
> uniqueness to which you refer.  I had the impression that Windows'
> GUIDs are always so many bits (128?), which obviously puts a
> (large) upper limit on the number of unique identifiers which can
> be constructed. 


If you generate all 128 bits completely randomly, you're unlikely to
see a duplicate before you've issued on the order of 2**64 GUID's,
which should be enough for any practical system.  But that's not how
Windows does it.

The trouble is that they're constructed somewhat deterministically and
a lot of bits get re-used.  For example, let's say that of that 128
bits, 48 bits are the MAC address from your network card and another
32 bits are a timestamp with one-second resolution.  Boom, 80 of those
128 bits are immediately duplicated if you generate two GUID's in one
second.  Even if you use a higher resolution timestamp and generate
GUID's slowly, you can still re-use a timestamp if you reboot your
computer and set the clock backwards by accident or because you didn't
understand Daylight Savings Time, etc.  And it's an empirical fact
that despite the "mandatory" uniqueness of Ethernet MAC addresses,
they do get duplicated sometimes.  You can expect the same of wifi
cards.  

Also, since the MAC address identifies the hardware it came from
(though not perfectly uniquely as we saw), exporting such a GUID to
another computer can be a privacy breach.  For example, suppose the
database contains medical records about an experimental drug, and
you're an AIDS patient taking the drug.  Say you're supposed to take
your temperature every day to upload to the database.  If the program
that you upload with generates a GUID for each reading, then each of
those records will point back to your MAC address even though the
database is otherwise anonymized and privacy protected.  Later you
might publish a GUID for some completely unrelated reason, say as an
interface ID for some COM object that you develop.  Now your use of
the AIDS drug can be connected back to you.  (Microsoft Word used to
leave GUID's in documents indiscriminately, but they stopped after a
privacy flap over basically this issue.)  

So, it's best to use really random GUID's rather than cooking them 
with any kind of [mostly-]deterministic algorithm.




More information about the Python-list mailing list