[Chicago] topics!

Tim Saylor tim.saylor at gmail.com
Wed Jan 4 21:05:17 CET 2012


*Amazon EC2 High-Memory Extra Large Instance*

17.1 GB of memory
6.5 EC2 Compute Units (2 virtual cores with 3.25 EC2 Compute Units each)
420 GB of instance storage
64-bit platform
I/O Performance: Moderate
API name: m2.xlarge

Price: $0.50/hour
EC2 is perfect for that kind of thing.

On Wed, Jan 4, 2012 at 12:21 PM, Clyde Forrester
<clydeforrester at gmail.com>wrote:

> "I had to solve some interesting problems..."
>
> Not being able to suck a 240MB file into memory, clean it, join it
> (another 240MB), and make a reverse complement (another 240MB), was one of
> the problems. So I went the other way and used an algorithm which would act
> one character at a time, in a single pass, allowing for multiple framings
> and such. The advantage is that it is very memory efficient. It also wound
> up being very scalable.
>
> As for the file layout: there are 25 text files. I loop through each file,
> each line, each character, in a single pass. It makes for a good textbook
> case example.
>
> The downside, for now, is that I can't do fuzzy matches the way I would
> like to. To solve that, I will probably build a machine with 16GB of
> memory, which will enable me to suck in the largest file at least 3 times
> over. Sometimes brute force is the path of least resistance. Wake me up
> when I can afford it.
>
> Clyde
>
>
> Joshua Herman wrote:
>
>> Clyde forgot to mention that since he couldn't load the whole human
>> genome into memory he actually searches through the file on disk. At
>> least when I talked with him I think that is what it does.
>> ---Profile:---
>> http://www.google.com/**profiles/zitterbewegung<http://www.google.com/profiles/zitterbewegung>
>>
>>
>>
>>
>> On Wed, Jan 4, 2012 at 10:31 AM, Clyde Forrester
>> <clydeforrester at gmail.com> wrote:
>>
>>> I recently wrote a program to count the occurrences of "GATACCA" in the
>>> human genome. I can do a brief talk on that. I had to solve some
>>> interesting
>>> problems, and it provides an interesting example of text file reading,
>>> compound lists, and some objectish methods.
>>>
>>> Clyde
>>>
>>
> ______________________________**_________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/**mailman/listinfo/chicago<http://mail.python.org/mailman/listinfo/chicago>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20120104/cf5fb481/attachment.html>


More information about the Chicago mailing list