Python-list Digest, Vol 88, Issue 69
williem75 at gmail.com
williem75 at gmail.com
Wed Jan 26 16:25:04 EST 2011
Sent from my LG phone
python-list-request at python.org wrote:
>Send Python-list mailing list submissions to
> python-list at python.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
> http://mail.python.org/mailman/listinfo/python-list
>or, via email, send a message with subject or body 'help' to
> python-list-request at python.org
>
>You can reach the person managing the list at
> python-list-owner at python.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Python-list digest..."
>
>Today's Topics:
>
> 1. Re: Python use growing fast (Alice Bevan?McGregor)
> 2. Re: order of importing modules (Chris Rebert)
> 3. Re: How to Buffer Serialized Objects to Disk (MRAB)
> 4. Re: How to Buffer Serialized Objects to Disk (Chris Rebert)
> 5. Re: How to Buffer Serialized Objects to Disk (Peter Otten)
> 6. Re: Best way to automatically copy out attachments from an
> email (Chris Rebert)
> 7. Re: Parsing string for "<verb> <noun>" (Aahz)
> 8. Re: Nested structures question (Tim Harig)
> 9. Re: How to Buffer Serialized Objects to Disk (Scott McCarty)
>
>On 2011-01-10 19:49:47 -0800, Roy Smith said:
>
>> One of the surprising (to me, anyway) uses of JavaScript is as the
>> scripting language for MongoDB (http://www.mongodb.org/).
>
>I just wish they'd drop spidermonkey and go with V8 or another, faster
>and more modern engine. :(
>
> - Alice.
>
>
>
>
>> Dan Stromberg wrote:
>>> On Tue, Jan 11, 2011 at 4:30 PM, Catherine Moroney
>>> <Catherine.M.Moroney at jpl.nasa.gov> wrote:
>>>>
>>>> In what order does python import modules on a Linux system? I have a
>>>> package that is both installed in /usr/lib64/python2.5/site-packages,
>>>> and a newer version of the same module in a working directory.
>>>>
>>>> I want to import the version from the working directory, but when I
>>>> print module.__file__ in the interpreter after importing the module,
>>>> I get the version that's in site-packages.
>>>>
>>>> I've played with the PYTHONPATH environmental variable by setting it
>>>> to just the path of the working directory, but when I import the module
>>>> I still pick up the version in site-packages.
>>>>
>>>> /usr/lib64 is in my PATH variable, but doesn't appear anywhere else. I
>>>> don't want to remove /usr/lib64 from my PATH because that will break
>>>> a lot of stuff.
>>>>
>>>> Can I force python to import from my PYTHONPATH first, before looking
>>>> in the system directory?
>>>>
>>> Please import sys and inspect sys.path; this defines the search path
>>> for imports.
>>>
>>> By looking at sys.path, you can see where in the search order your
>>> $PYTHONPATH is going.
>>>
>On Wed, Jan 12, 2011 at 11:07 AM, Catherine Moroney
><Catherine.M.Moroney at jpl.nasa.gov> wrote:
>> I've looked at my sys.path variable and I see that it has
>> a whole bunch of site-package directories, followed by the
>> contents of my $PYTHONPATH variable, followed by a list of
>> misc site-package variables (see below).
><snip>
>> But, I'm curious as to where the first bunch of 'site-package'
>> entries come from. The
>> /usr/lib64/python2.5/site-packages/pyhdfeos-1.0_r57_58-py2.5-linux-x86_64.egg
>> is not present in any of my environmental variables yet it shows up
>> as one of the first entries in sys.path.
>
>You probably have a .pth file somewhere that adds it (since it's an
>egg, probably site-packages/easy-install.pth).
>See http://docs.python.org/install/index.html#modifying-python-s-search-path
>
>Cheers,
>Chris
>--
>http://blog.rebertia.com
>
>
>On 12/01/2011 21:05, Scott McCarty wrote:
>> Sorry to ask this question. I have search the list archives and googled,
>> but I don't even know what words to find what I am looking for, I am
>> just looking for a little kick in the right direction.
>>
>> I have a Python based log analysis program called petit
>> (http://crunchtools.com/petit). I am trying to modify it to manage the
>> main object types to and from disk.
>>
>> Essentially, I have one object which is a list of a bunch of "Entry"
>> objects. The Entry objects have date, time, date, etc fields which I use
>> for analysis techniques. At the very beginning I build up the list of
>> objects then would like to start pickling it while building to save
>> memory. I want to be able to process more entries than I have memory.
>> With a strait list it looks like I could build from xreadlines(), but
>> once you turn it into a more complex object, I don't quick know where to go.
>>
>> I understand how to pickle the entire data structure, but I need
>> something that will manage the memory/disk allocation? Any thoughts?
>>
>To me it sounds like you need to use a database.
>
>
>On Wed, Jan 12, 2011 at 1:05 PM, Scott McCarty <scott.mccarty at gmail.com> wrote:
>> Sorry to ask this question. I have search the list archives and googled, but
>> I don't even know what words to find what I am looking for, I am just
>> looking for a little kick in the right direction.
>> I have a Python based log analysis program called petit
>> (http://crunchtools.com/petit). I am trying to modify it to manage the main
>> object types to and from disk.
>> Essentially, I have one object which is a list of a bunch of "Entry"
>> objects. The Entry objects have date, time, date, etc fields which I use for
>> analysis techniques. At the very beginning I build up the list of objects
>> then would like to start pickling it while building to save memory. I want
>> to be able to process more entries than I have memory. With a strait list it
>> looks like I could build from xreadlines(), but once you turn it into a more
>> complex object, I don't quick know where to go.
>> I understand how to pickle the entire data structure, but I need something
>> that will manage the memory/disk allocation? Any thoughts?
>
>You could subclass `list` and use sys.getsizeof()
>[http://docs.python.org/library/sys.html#sys.getsizeof ] to keep track
>of the size of the elements, and then start pickling them to disk once
>the total size reaches some preset limit.
>But like MRAB said, using a proper database, e.g. SQLite
>(http://docs.python.org/library/sqlite3.html ), wouldn't be a bad idea
>either.
>
>Cheers,
>Chris
>--
>http://blog.rebertia.com
>
>
>Scott McCarty wrote:
>
>> Sorry to ask this question. I have search the list archives and googled,
>> but I don't even know what words to find what I am looking for, I am just
>> looking for a little kick in the right direction.
>>
>> I have a Python based log analysis program called petit (
>> http://crunchtools.com/petit). I am trying to modify it to manage the main
>> object types to and from disk.
>>
>> Essentially, I have one object which is a list of a bunch of "Entry"
>> objects. The Entry objects have date, time, date, etc fields which I use
>> for analysis techniques. At the very beginning I build up the list of
>> objects then would like to start pickling it while building to save
>> memory. I want to be able to process more entries than I have memory. With
>> a strait list it looks like I could build from xreadlines(), but once you
>> turn it into a more complex object, I don't quick know where to go.
>>
>> I understand how to pickle the entire data structure, but I need something
>> that will manage the memory/disk allocation? Any thoughts?
>
>You can write multiple pickled objects into a single file:
>
>import cPickle as pickle
>
>def dump(filename, items):
> with open(filename, "wb") as out:
> dump = pickle.Pickler(out).dump
> for item in items:
> dump(item)
>
>def load(filename):
> with open(filename, "rb") as instream:
> load = pickle.Unpickler(instream).load
> while True:
> try:
> item = load()
> except EOFError:
> break
> yield item
>
>if __name__ == "__main__":
> filename = "tmp.pickle"
> from collections import namedtuple
> T = namedtuple("T", "alpha beta")
> dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
> for item in load(filename):
> print item
>
>To get random access you'd have to maintain a list containing the offsets of
>the entries in the file.
>However, a simple database like SQLite is probably sufficient for the kind
>of entries you have in mind, and it allows operations like aggregation,
>sorting and grouping out of the box.
>
>Peter
>
>
>
>On Wed, Jan 12, 2011 at 10:59 AM, Matty Sarro <msarro at gmail.com> wrote:
>> As of now here is my situation:
>> I am working on a system to aggregate IT data and logs. A number of
>> important data are gathered by a third party system. The only
>> immediate way I have to access the data is to have their system
>> automatically email me updates in CSV format every hour. If I set up a
>> mail client on the server, this shouldn't be a huge issue.
>>
>> However, is there a way to automatically open the emails, and copy the
>> attachments to a directory based on the filename? Kind of a weird
>> project, I know. Just looking for some ideas hence posting this on two
>> lists.
>
>Parsing out email attachments:
>http://docs.python.org/library/email.parser.html
>http://docs.python.org/library/email.message.html#module-email.message
>
>Parsing the extension from a filename:
>http://docs.python.org/library/os.path.html#os.path.splitext
>
>Retrieving email from a mail server:
>http://docs.python.org/library/poplib.html
>http://docs.python.org/library/imaplib.html
>
>You could poll for new messages via a cron job or the `sched` module
>(http://docs.python.org/library/sched.html ). Or if the messages are
>being delivered locally, you could use inotify bindings or similar to
>watch the appropriate directory for incoming mail. Integration with a
>mail server itself is also a possibility, but I don't know much about
>that.
>
>Cheers,
>Chris
>--
>http://blog.rebertia.com
>
>
>In article <0d7143ca-45cf-44c3-9e8d-acb867c52037 at f30g2000yqa.googlegroups.com>,
>Daniel da Silva <ddasilva at umd.edu> wrote:
>>
>>I have come across a task where I would like to scan a short 20-80
>>character line of text for instances of "<verb> <noun>". Ideally
>><verb> could be of any tense.
>
>In Soviet Russia, <noun> <verbs> you!
>--
>Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/
>
>"Think of it as evolution in action." --Tony Rand
>
>
>In case you still need help:
>
>- # Set the initial values
>- the_number= random.randrange(100) + 1
>- tries = 0
>- guess = None
>-
>- # Guessing loop
>- while guess != the_number and tries < 7:
>- guess = int(raw_input("Take a guess: "))
>- if guess > the_number:
>- print "Lower..."
>- elif guess < the_number:
>- print "Higher..."
>- tries += 1
>-
>- # did the user guess correctly to make too many guesses?
>- if guess == the_number:
>- print "You guessed it! The number was", the_number
>- print "And it only took you", tries, "tries!\n"
>- else:
>- print "Wow you suck! It should only take at most 7 tries!"
>-
>- raw_input("Press Enter to exit the program.")
>
>
>Been digging ever since I posted this. I suspected that the response might
>be use a database. I am worried I am trying to reinvent the wheel. The
>problem is I don't want any dependencies and I also don't need persistence
>program runs. I kind of wanted to keep the use of petit very similar to cat,
>head, awk, etc. But, that said, I have realized that if I provide the
>analysis features as an API, you very well, might want persistence between
>runs.
>
>What about using an array inside a shelve?
>
>Just got done messing with this in python shell:
>
>import shelve
>
>d = shelve.open(filename="/root/test.shelf", protocol=-1)
>
>d["log"] = ()
>d["log"].append("test1")
>d["log"].append("test2")
>d["log"].append("test3")
>
>Then, always interacting with d["log"], for example:
>
>for i in d["log"]:
> print i
>
>Thoughts?
>
>
>I know this won't manage memory, but it will keep the footprint down right?
>On Wed, Jan 12, 2011 at 5:04 PM, Peter Otten <__peter__ at web.de> wrote:
>
>> Scott McCarty wrote:
>>
>> > Sorry to ask this question. I have search the list archives and googled,
>> > but I don't even know what words to find what I am looking for, I am just
>> > looking for a little kick in the right direction.
>> >
>> > I have a Python based log analysis program called petit (
>> > http://crunchtools.com/petit). I am trying to modify it to manage the
>> main
>> > object types to and from disk.
>> >
>> > Essentially, I have one object which is a list of a bunch of "Entry"
>> > objects. The Entry objects have date, time, date, etc fields which I use
>> > for analysis techniques. At the very beginning I build up the list of
>> > objects then would like to start pickling it while building to save
>> > memory. I want to be able to process more entries than I have memory.
>> With
>> > a strait list it looks like I could build from xreadlines(), but once you
>> > turn it into a more complex object, I don't quick know where to go.
>> >
>> > I understand how to pickle the entire data structure, but I need
>> something
>> > that will manage the memory/disk allocation? Any thoughts?
>>
>> You can write multiple pickled objects into a single file:
>>
>> import cPickle as pickle
>>
>> def dump(filename, items):
>> with open(filename, "wb") as out:
>> dump = pickle.Pickler(out).dump
>> for item in items:
>> dump(item)
>>
>> def load(filename):
>> with open(filename, "rb") as instream:
>> load = pickle.Unpickler(instream).load
>> while True:
>> try:
>> item = load()
>> except EOFError:
>> break
>> yield item
>>
>> if __name__ == "__main__":
>> filename = "tmp.pickle"
>> from collections import namedtuple
>> T = namedtuple("T", "alpha beta")
>> dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
>> for item in load(filename):
>> print item
>>
>> To get random access you'd have to maintain a list containing the offsets
>> of
>> the entries in the file.
>> However, a simple database like SQLite is probably sufficient for the kind
>> of entries you have in mind, and it allows operations like aggregation,
>> sorting and grouping out of the box.
>>
>> Peter
>>
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>
>--
>http://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list