[Tutor] Is the difference in outputs with different size input lists due to limits on memory with PYTHON?
Art Kendall
Art at DrKendall.org
Fri May 7 13:35:33 CEST 2010
On 5/6/2010 8:52 PM, Dave Angel wrote:
>>
>
> I got my own copy of the papers, at
> http://thomas.loc.gov/home/histdox/fedpaper.txt
>
> I copied your code, and added logic to it to initialize termlist from
> the actual file. And it does complete the output file at 83 lines,
> approx 17000 columns per line (because most counts are one digit). It
> takes quite a while, and perhaps you weren't waiting for it to
> complete. I'd suggest either adding a print to the loop, showing the
> count, and/or adding a line that prints "done" after the loop
> terminates normally.
>
> I watched memory usage, and as expected, it didn't get very high.
> There are things you need to redesign, however. One is that all the
> punctuation and digits and such need to be converted to spaces.
>
>
> DaveA
>
>
Thank you for going the extra mile.
I obtained my copy before I retired in 2001 and there are some
differences. In the current copy from the LOC papers 7, 63, and 81
start with "FEDERALIST." (an extra period). That explains why you have
83. There also some comments such as attributed author. After the
weekend, I'll do a file compare and see differences in more detail.
Please email me your version of the code. I'll try it as is. Then I'll
put in a counter, have it print the count and paper number, and a 'done'
message.
As a check after reading in the counts, I'll include the counts from
NoteTab and see if these counts sum to those from NoteTab.
I'll use SPSS to create a version of the .txt file with punctuation and
numerals changed to spaces and try using that as the corpus. Then I'll
try to create a similar file with Python.
Art
More information about the Tutor
mailing list