[Tutor] can anyone help me in solving this problem this is urgent
Emile van Sebille
emile at fenx.com
Sat Nov 7 19:04:31 CET 2009
On 11/6/2009 4:24 PM surjit khakh said...
> Write a python program to read a text file named “text.txt” and show the
> of times each article is found in the file. Articles in the English
> language are the
> words “a”, “an”, and “the”.
Sounds like you're taking a python class. Great! It's probably the
best programming language to start with.
First, it helps when asking questions if you mention what version of the
language you're using. Some features and options are newer. In
particular, there's a string method 'count' that isn't available in
older pythons, while the replace method has been around at least ten years.
If you haven't already, the tutorial at
http://docs.python.org/tutorial/index.html is a great place to start.
Pay particular attention to section 3's string introduction at
http://docs.python.org/tutorial/introduction.html#strings and section 7
Implicit in this problem is identifying words in the text file. This is
tough because you need to take punctuation into account. There's a neat
tool in newer pythons such that, assuming you've read the file contents
into a variable txt, allows you to say set(txt) to get all the letters,
numbers, punctuation marks, and any other whitespace type characters
embedded in the content. You'll need to know these so that you can
recognize the word regardless of adjacent punctuation. In this specific
case, as articles in English always precede nouns you'll always find
whitespace following an article. It would be a space except, of course,
when the article ends the line and line wrap characters are included in
the text file.
For example, consider the following text:
SECTION 1.4. COUNTY PLANNING COMMISSION.
a. The County Planning Commission shall consist of five members. Each
member of the Board of Supervisors shall recommend that a resident of
his district be appointed to the Commission; provided, however, the
appointments to the Commission shall require the affirmative vote of not
less than a majority of the entire membership of the Board.
Any a's, an's or the's in the paragraph body can be easily counted with
the string count method once you properly prepared the text.
I expect the an's and the's are the easy ones to count. Consider
however the paragraph identifier -- "a." -- this is not an article but
would likely be counted as one in most solutions. There may also be a
subsequent reference to this section (eg, see a above) or range of
sections (eg, see a-n above) that further make this a harder problem.
One possible approach may involve confirming the a noun follows the
article. There are dictionaries you can access, or word lists that can
help. The WordNet database from Princeton appears fairly complete with
117k entries, but even there it's easy to find exceptions: "A 20's style
approach"; "a late bus"; or "a fallen hero".
So, frankly, I expect that solutions to this problem will range from the
naive through the reasonably complete to the impossible without human
confirmation of complex structure and context.
For your homework, showing you can read in the file, strip out any
punctuation, count the resulting occurances, and report the results
should do it.
More information about the Tutor