Here is a simple and quick solution --<br><br>Generate a random number<br>"random.shuffle(x[, random])¶Shuffle the sequence x in place. The optional argument random is a 0-argument function returning a random float in [0.0, 1.0); by default, this is the function random()."<br>
<a href="http://docs.python.org/library/random.html">http://docs.python.org/library/random.html</a><br><br>Multiple the random value returned by the maxnumber of your table primary key index.<br><a href="http://www.tizag.com/mysqlTutorial/mysqlmax.php">http://www.tizag.com/mysqlTutorial/mysqlmax.php</a><br>
<br>Then use the result in your query as<br><br>randID = MAX(id) * random()<br><br>SELECT objectname FROM products WHERE objectID = randID<br><br>Hope this help.<br>Cheers... Threader<br><br><br>---------- Forwarded message ----------<br>
From: Dennis Lee Bieber <<a href="mailto:wlfraed@ix.netcom.com">wlfraed@ix.netcom.com</a>><br>To: <a href="mailto:python-list@python.org">python-list@python.org</a><br>Date: Mon, 21 Sep 2009 21:40:02 -0700<br>Subject: Re: [SQL] Pick random rows from SELECT?<br>
On Mon, 21 Sep 2009 10:59:38 +0200, Gilles Ganault <<a href="mailto:nospam@nospam.com">nospam@nospam.com</a>><br>
declaimed the following in gmane.comp.python.general:<br>
<br>
> Since this list is quite big and the site is the bottleneck, I'd like<br>
> to run multiple instances of this script, and figured a solution would<br>
> be to pick rows at random from the dataset, check in my local database<br>
> if this item has already been taken care of, and if not, download<br>
> details from the remote web site.<br>
><br>
You really think making MULTIPLE, overlapping requests to a web site<br>
is going to be more efficient than just suffering the single transfer<br>
time of one large query?<br>
<br>
> If someone's done this before, should I perform the randomization in<br>
> the SQL query (SQLite using the APSW wrapper<br>
> <a href="http://code.google.com/p/apsw/" target="_blank">http://code.google.com/p/apsw/</a><div id=":1m" class="ii gt">), or in Python?<br>
><br>
Pardon, I thought you implied the bottleneck is the web-site<br>
database -- I'd worry about any web-site that exposes a file-server<br>
based database to direct user access.<br>
<br>
> Here's some simplified code:<br>
><br>
> sql = 'SELECT id,label FROM companies WHERE activity=1'<br>
> rows=list(cursor.execute(sql))<br>
> for row in rows:<br>
> id = row[0]<br>
> label = row[1]<br>
><br>
> print strftime("%H:%M")<br>
> url = "<a href="http://www.acme.com/details.php?id=%s" target="_blank">http://www.acme.com/details.php?id=%s</a>" % id<br>
> req = urllib2.Request(url, None, headers)<br>
> response = urllib2.urlopen(req).read()<br>
><br>
> name = re_name.search(response)<br>
> if name:<br>
> name = name.group(1)<br>
> sql = 'UPDATE companies SET name=? WHERE id=?'<br>
> cursor.execute(sql, (name,id) )<br>
<br>
Ah... You mean you are retrieving the names from a local database,<br>
and then requesting web-site details based upon that name.<br>
<br>
No matter how you look at it, you appear to want to process the<br>
entire local list of companies... Multiple randomized local queries will<br>
just add to the final run-time as you start to get duplicates -- and<br>
have to reject that one to query for another random name.<br>
<br>
I'd suggest either a pool of threads -- 5-10, each reading company<br>
names from a shared QUEUE, which is populated by the main thread<br>
(remember to commit() so that you don't block on database updates by the<br>
threads). OR... determine how many companies there are, and start<br>
threads feeding them <start> and <length> (length being #names /<br>
#threads, round up -- start then being 0*length+1, 1*length+1, etc...)<br>
and use those in thread specific selects using "... limit <length><br>
offset <start>"... This way each thread retrieves its own limited set of<br>
companies (make sure to use the same sorting criteria).<br>
--<br>
Wulfraed Dennis Lee Bieber KD6MOG<br>
<a href="mailto:wlfraed@ix.netcom.com">wlfraed@ix.netcom.com</a> <a href="http://wlfraed.home.netcom.com/" target="_blank">HTTP://wlfraed.home.netcom.com/</a><br>
<br>
<br>
<br><br>---------- Forwarded message ----------<br>From: greg <<a href="mailto:greg@cosc.canterbury.ac.nz">greg@cosc.canterbury.ac.nz</a>><br>To: <a href="mailto:python-list@python.org">python-list@python.org</a><br>
Date: Tue, 22 Sep 2009 17:07:33 +1200<br>Subject: Re: Comparison of parsers in python?<br>Nobody wrote:<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
What I want: a tokeniser generator which can take a lex-style grammar (not<br>
necessarily lex syntax, but a set of token specifications defined by<br>
REs, BNF, or whatever), generate a DFA, then run the DFA on sequences of<br>
bytes. It must allow the syntax to be defined at run-time.<br>
</blockquote>
<br>
You might find my Plex package useful:<br>
<br>
<a href="http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/" target="_blank">http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/</a><br>
<br>
It was written some time ago, so it doesn't know about<br>
the new bytes type yet, but it shouldn't be hard to<br>
adapt it for that if you need to.<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
What I don't want: anything written by someone who doesn't understand the<br>
field (i.e. anything which doesn't use a DFA).<br>
</blockquote>
<br>
Plex uses a DFA.<br>
<br>
-- <br>
Greg</div>