Is there anyone familiar with pybloom (bloom filter in python)?

Xell Zhang xellzhang at gmail.com
Sun Jul 8 20:18:36 CEST 2007


Hello,

I found pybloom module from http://www.imperialviolet.org/pybloom.html and
tried to use it for my crawler:)
I want to use it to store the URLs which have been crawled. But when I
insert a URL string I always get a warning and wrong result...

My testing code is quite simple:
from pybloom import CountedBloom
cb = CountedBloom(800000, 4)
cb.insert("AAA")
print cb.__contains__("BBB")

Warning:
E:\EclipseWorkspace\demo\src\pybloom.py:74: DeprecationWarning: 'I' format
requires 0 <= number <= 4294967295
  b = [ord(x) for x in struct.pack ('I', val)]

I will get warning when running the code above.
The output is "1" which means "BBB" is in the set. But actually it is not...
When I use integer for testing it seems right.

I am not familiar with arithmetic and I don't know if I wrote something
wrong.
Can anyone help me? Thanks!



-- 
Zhang Xiao

Junior engineer, Web development

Ethos Tech.
http://www.ethos.com.cn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070709/75089343/attachment.html>


More information about the Python-list mailing list