[Tutor] can anyone help me in solving this problem this is urgent

Emile van Sebille emile at fenx.com
Sat Nov 7 19:04:31 CET 2009

On 11/6/2009 4:24 PM surjit khakh said...
> Write a python program to read a text file named “text.txt” and show the 
> number
> of times each article is found in the file. Articles in the English 
> language are the
> words “a”, “an”, and “the”.

Sounds like you're taking a python class.  Great!  It's probably the 
best programming language to start with.

First, it helps when asking questions if you mention what version of the 
language you're using.  Some features and options are newer.  In 
particular, there's a string method 'count' that isn't available in 
older pythons, while the replace method has been around at least ten years.

If you haven't already, the tutorial at 
http://docs.python.org/tutorial/index.html is a great place to start. 
Pay particular attention to section 3's string introduction at 
http://docs.python.org/tutorial/introduction.html#strings and section 7 
starting with 
on files.

Implicit in this problem is identifying words in the text file.  This is 
tough because you need to take punctuation into account.  There's a neat 
tool in newer pythons such that, assuming you've read the file contents 
into a variable txt, allows you to say set(txt) to get all the letters, 
numbers, punctuation marks, and any other whitespace type characters 
embedded in the content.  You'll need to know these so that you can 
recognize the word regardless of adjacent punctuation.  In this specific 
case, as articles in English always precede nouns you'll always find 
whitespace following an article.  It would be a space except, of course, 
when the article ends the line and line wrap characters are included in 
the text file.

For example, consider the following text:


a. The County Planning Commission shall consist of five members. Each 
member of the Board of Supervisors shall recommend that a resident of 
his district be appointed to the Commission; provided, however, the 
appointments to the Commission shall require the affirmative vote of not 
less than a majority of the entire membership of the Board.

Any a's, an's or the's in the paragraph body can be easily counted with 
the string count method once you properly prepared the text.

I expect the an's and the's are the easy ones to count.  Consider 
however the paragraph identifier -- "a." -- this is not an article but 
would likely be counted as one in most solutions.  There may also be a 
subsequent reference to this section (eg, see a above) or range of 
sections (eg, see a-n above) that further make this a harder problem. 
One possible approach may involve confirming the a noun follows the 
article.  There are dictionaries you can access, or word lists that can 
help.  The WordNet database from Princeton appears fairly complete with 
117k entries, but even there it's easy to find exceptions: "A 20's style 
approach"; "a late bus"; or "a fallen hero".

So, frankly, I expect that solutions to this problem will range from the 
naive through the reasonably complete to the impossible without human 
confirmation of complex structure and context.

For your homework, showing you can read in the file, strip out any 
punctuation, count the resulting occurances, and report the results 
should do it.


More information about the Tutor mailing list