[Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?

Fri May 7 13:35:33 CEST 2010

On 5/6/2010 8:52 PM, Dave Angel wrote:
>>
>
> I got my own copy of the papers, at 
> http://thomas.loc.gov/home/histdox/fedpaper.txt
>
> I copied your code, and added logic to it to initialize termlist from 
> the actual file.  And it does complete the output file at 83 lines, 
> approx 17000 columns per line (because most counts are one digit).  It 
> takes quite a while, and perhaps you weren't waiting for it to 
> complete.  I'd suggest either adding a print to the loop, showing the 
> count, and/or adding a line that prints "done" after the loop 
> terminates normally.
>
> I watched memory usage, and as expected, it didn't get very high.  
> There are things you need to redesign, however.  One is that all the 
> punctuation and digits and such need to be converted to spaces.
>
>
> DaveA
>
>

Thank you for going the extra mile.

I obtained my copy before I retired in 2001 and there are some 
differences.  In the current copy from the LOC papers 7, 63, and 81 
start with "FEDERALIST." (an extra period).  That explains why you have 
83. There also some comments such as attributed author.  After the 
weekend, I'll do a file compare and see differences in more detail.

Please email me your version of the code.  I'll try it as is.  Then I'll 
put in a counter, have it print the count and paper number, and a 'done' 
message.

As a check after reading in the counts, I'll include the counts from 
NoteTab and see if these counts sum to those from NoteTab.

I'll use SPSS to create a version of the .txt file with punctuation and 
numerals changed to spaces and try using that as the corpus.   Then I'll 
try to create a similar file with Python.

Art