[Tutor] pickle problems

Sun Aug 12 14:46:46 CEST 2012

On Sun, Aug 12, 2012 at 4:44 AM, Richard D. Moores <rdmoores at gmail.com> wrote:
>
> OK, thanks for the code, which I will duly study. However, I just
> pasted my new version, pickle_attempt_for_web2.py at
> <http://pastebin.com/EZUKjPSk>. I've tested it and tested it, and it
> does exactly what I wanted (thanks to you!). Yes, you're correct, I
> want to grow the pickle file, the dictionary. The new version puts
> every new item into it -- nothing gets lost.

I forgot to separate out the PickleError. It shouldn't append new data
to a file that gave an error, so rename the existing file. A new file
is created later when factors.dat is opened in append mode ('ab').

You asked about handling an empty file. This version (which grows the
existing file per session instead of writing a new file) always sees
EOFError since it calls pickle.load(f) until it reaches EOF. But the
same applies if you only call pickle.load(f) once on an empty file.
Just handle the exception with a simple "pass" statement.

I narrowed the IOError handling to only deal with ENOENT (file not
found). There are other reasons the file could fail to open for
reading (e.g. it's a directory or you don't have permission), none of
which you probably want to handle automatically, so it just exits with
an error.

I restructured the main loop to put everything in the try block and
handled the ValueError case as well as exiting with an error if the
pickle/save fails.

import sys, os, errno
import pickle
from pickle import PickleError

D = {}
session = {}
try:
    with open("factors.dat", 'rb') as f:
        while True:
            D.update(pickle.load(f))
except EOFError:
    pass    #Empty file or finished loading
except IOError as e:
    if e.errno != errno.ENOENT:
        sys.exit("Can't open factors.dat for reading.")
except PickleError:
    try:
        #Rename so as not to append to a bad file
        os.rename('factors.dat', 'factors.err')
    except OSError:
        sys.exit("Renaming damaged factors.dat failed.")

if len(D):
    print("Loaded", len(D), "entries.")

while True:
    try:
        n = int(input("Enter a non-negative integer (0 to quit): "))
        if n < 0:
            raise ValueError

        if n == 0:
            print("D has", len(D), "entries")
            if len(session):
                with open('factors.dat', 'ab') as f:
                   pickle.dump(session, f)
            sys.exit(0)

        factors = D[n]
        print("the factors of", n, "are", factors)

    except ValueError:
        print("e.g. 0, 1, 2, 3")

    except KeyError:
        factors = factorsOfInteger(n)
        print("the factors of", n, "are", factors)
        D[n] = session[n] = factors

    except (IOError, PickleError) as e:
        sys.exit("Error saving data: {0}".format(e.args[-1]))

> One thing I'd like to implement is a monitor of the time
> factorsOfInteger(n) takes to process some of the 18-digit ints (line
> 153). Most are processed within a second or two, but some can take
> several minutes. I'd like to limit the time to 15 or 20 seconds, but
> is there a way to do this? Just a wild guess, but is this where
> threading would be useful?

I'd put the time check in the main loop of factorsOfInteger.

threading doesn't have an interface to stop a running thread. If you
want to use threads, use queues since they're a simple, thread-safe
way to communicate between threads. You can modify factorsOfInteger to
monitor a queue.Queue and break when it receives a command to halt.
Use a 2nd queue to get the result back from the worker thread.
Specifically, only call findFactor in factorsOfInteger if qin.empty()
is True. Otherwise, qout.put(False) and break. If the function
terminates normally, then qout.put(factors).

qin = queue.Queue()
qout = queue.Queue()

t = threading.Thread(target=factorsOfInteger, args=(n, qin, qout)).start()

try:
    factors = qout.get(timeout=20)
except queue.Empty:
    qin.put('halt')

t.join()  #wait for the thread to terminate
factors = qout.get()  #possibly False if factorsOfInteger quit early
if factors:
    D[n] = session[n] = factors

See the docs:

http://docs.python.org/py3k/library/threading.html
http://docs.python.org/py3k/library/queue.html

You could also use a multiprocessing pool to speed things up if you
have multiple cores. You'd have to rewrite the factorization code a
bit. Partition the search range (i.e. up to the square root) among N
cores and run findFactor in parallel. If you have 4 cores, you'll get
up to 4 factors. Divide them out, repartition the new range, and
repeat. Or something like that. I'm sure you can find a good reference
online for parallel factorization.