Here is a simple and quick solution --<br><br>Generate a random number<br>"random.shuffle(x[, random])¶Shuffle the sequence x in place. The optional argument random is a 0-argument function returning a random float in [0.0, 1.0); by default, this is the function random()."<br>


<a href="http://docs.python.org/library/random.html">http://docs.python.org/library/random.html</a><br><br>Multiple the random value returned by the maxnumber of your table primary key index.<br><a href="http://www.tizag.com/mysqlTutorial/mysqlmax.php">http://www.tizag.com/mysqlTutorial/mysqlmax.php</a><br>


<br>Then use the result in your query as<br><br>randID = MAX(id) * random()<br><br>SELECT objectname FROM products WHERE objectID = randID<br><br>Hope this help.<br>Cheers... Threader<br><br><br>---------- Forwarded message ----------<br>


From: Dennis Lee Bieber <<a href="mailto:wlfraed@ix.netcom.com">wlfraed@ix.netcom.com</a>><br>To: <a href="mailto:python-list@python.org">python-list@python.org</a><br>Date: Mon, 21 Sep 2009 21:40:02 -0700<br>Subject: Re: [SQL] Pick random rows from SELECT?<br>


On Mon, 21 Sep 2009 10:59:38 +0200, Gilles Ganault <<a href="mailto:nospam@nospam.com">nospam@nospam.com</a>><br>


declaimed the following in gmane.comp.python.general:<br>


<br>


> Since this list is quite big and the site is the bottleneck, I'd like<br>


> to run multiple instances of this script, and figured a solution would<br>


> be to pick rows at random from the dataset, check in my local database<br>


> if this item has already been taken care of, and if not, download<br>


> details from the remote web site.<br>


><br>


        You really think making MULTIPLE, overlapping requests to a web site<br>


is going to be more efficient than just suffering the single transfer<br>


time of one large query?<br>


<br>


> If someone's done this before, should I perform the randomization in<br>


> the SQL query (SQLite using the APSW wrapper<br>


> <a href="http://code.google.com/p/apsw/" target="_blank">http://code.google.com/p/apsw/</a><div id=":1m" class="ii gt">), or in Python?<br>


><br>


        Pardon, I thought you implied the bottleneck is the web-site<br>


database -- I'd worry about any web-site that exposes a file-server<br>


based database to direct user access.<br>


<br>


> Here's some simplified code:<br>


><br>


> sql = 'SELECT id,label FROM companies WHERE activity=1'<br>


> rows=list(cursor.execute(sql))<br>


> for row in rows:<br>


>       id = row[0]<br>


>       label = row[1]<br>


><br>


>       print strftime("%H:%M")<br>


>       url = "<a href="http://www.acme.com/details.php?id=%s" target="_blank">http://www.acme.com/details.php?id=%s</a>" % id<br>


>       req = urllib2.Request(url, None, headers)<br>


>       response = urllib2.urlopen(req).read()<br>


><br>


>       name = re_name.search(response)<br>


>       if name:<br>


>               name = name.group(1)<br>


>       sql = 'UPDATE companies SET name=? WHERE id=?'<br>


>       cursor.execute(sql, (name,id) )<br>


<br>


        Ah... You mean you are retrieving the names from a local database,<br>


and then requesting web-site details based upon that name.<br>


<br>


        No matter how you look at it, you appear to want to process the<br>


entire local list of companies... Multiple randomized local queries will<br>


just add to the final run-time as you start to get duplicates -- and<br>


have to reject that one to query for another random name.<br>


<br>


        I'd suggest either a pool of threads -- 5-10, each reading company<br>


names from a shared QUEUE, which is populated by the main thread<br>


(remember to commit() so that you don't block on database updates by the<br>


threads). OR... determine how many companies there are, and start<br>


threads feeding them <start> and <length> (length being #names /<br>


#threads, round up -- start then being 0*length+1, 1*length+1, etc...)<br>


and use those in thread specific selects using "... limit <length><br>


offset <start>"... This way each thread retrieves its own limited set of<br>


companies (make sure to use the same sorting criteria).<br>


--<br>


        Wulfraed         Dennis Lee Bieber               KD6MOG<br>


        <a href="mailto:wlfraed@ix.netcom.com">wlfraed@ix.netcom.com</a>   <a href="http://wlfraed.home.netcom.com/" target="_blank">HTTP://wlfraed.home.netcom.com/</a><br>


<br>


<br>


<br><br>---------- Forwarded message ----------<br>From: greg <<a href="mailto:greg@cosc.canterbury.ac.nz">greg@cosc.canterbury.ac.nz</a>><br>To: <a href="mailto:python-list@python.org">python-list@python.org</a><br>


Date: Tue, 22 Sep 2009 17:07:33 +1200<br>Subject: Re: Comparison of parsers in python?<br>Nobody wrote:<br>


<br>


<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


What I want: a tokeniser generator which can take a lex-style grammar (not<br>


necessarily lex syntax, but a set of token specifications defined by<br>


REs, BNF, or whatever), generate a DFA, then run the DFA on sequences of<br>


bytes. It must allow the syntax to be defined at run-time.<br>


</blockquote>


<br>


You might find my Plex package useful:<br>


<br>


<a href="http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/" target="_blank">http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/</a><br>


<br>


It was written some time ago, so it doesn't know about<br>


the new bytes type yet, but it shouldn't be hard to<br>


adapt it for that if you need to.<br>


<br>


<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


What I don't want: anything written by someone who doesn't understand the<br>


field (i.e. anything which doesn't use a DFA).<br>


</blockquote>


<br>


Plex uses a DFA.<br>


<br>


-- <br>


Greg</div>