[Tutor] 1 to N searches in files

Spectral None spectralnone at yahoo.com.sg
Mon Dec 3 16:52:41 CET 2012


From: "tutor-request at python.org" <tutor-request at python.org>
To: tutor at python.org 
Sent: Monday, 3 December 2012, 21:57
Subject: Tutor Digest, Vol 106, Issue 9

Send Tutor mailing list submissions to
    tutor at python.org

To subscribe or unsubscribe via the World Wide Web, visit
    http://mail.python.org/mailman/listinfo/tutor
or, via email, send a message with subject or body 'help' to
    tutor-request at python.org

You can reach the person managing the list at
    tutor-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tutor digest..."


Today's Topics:

  1. Re: Tutor Digest, Vol 106, Issue 5 (Spectral None)


----------------------------------------------------------------------

Message: 1
Date: Mon, 3 Dec 2012 21:55:35 +0800 (SGT)
From: Spectral None <spectralnone at yahoo.com.sg>
To: "tutor at python.org" <tutor at python.org>
Subject: Re: [Tutor] Tutor Digest, Vol 106, Issue 5
Message-ID:
    <1354542935.11347.YahooMailNeo at web190604.mail.sg3.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

From: "tutor-request at python.org" <tutor-request at python.org>
To: tutor at python.org 
Sent: Sunday, 2 December 2012, 17:34
Subject: Tutor Digest, Vol 106, Issue 5

Send Tutor mailing list submissions to
??? tutor at python.org

To subscribe or unsubscribe via the World Wide Web, visit
??? http://mail.python.org/mailman/listinfo/tutor
or, via email, send a message with subject or body 'help' to
??? tutor-request at python.org

You can reach the person managing the list at
??? tutor-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tutor digest..."


Today's Topics:

? 1. Re: reverse diagonal (Dave Angel)
? 2. To Find the Answers (Sujit Baniya)
? 3. Re: To Find the Answers (Dave Angel)
? 4. Re: reverse diagonal (Steven D'Aprano)
? 5. 1 to N searches in files (Spectral None)
? 6. Re: 1 to N searches in files (Steven D'Aprano)


----------------------------------------------------------------------

Message: 1
Date: Sat, 01 Dec 2012 23:18:44 -0500
From: Dave Angel <d at davea.name>
To: eryksun <eryksun at gmail.com>
Cc: tutor at python.org
Subject: Re: [Tutor] reverse diagonal
Message-ID: <50BAD6A4.1020701 at davea.name>
Content-Type: text/plain; charset=UTF-8

On 12/01/2012 09:55 PM, eryksun wrote:
> On Sat, Dec 1, 2012 at 9:35 PM, Dave Angel <d at davea.name> wrote:
>>
>> [M[i][~i] for i,dummy in enumerate(M) ]
> 
> Since enumerate() iterates the rows, you could skip the first index:
> 
>? ? >>> [row[~i] for i,row in enumerate(M)]
>? ? [3, 5, 7]
> 
> 

Great job.? And I can't see any way to improve on that.

-- 

DaveA


------------------------------

Message: 2
Date: Sun, 2 Dec 2012 10:24:19 +0545
From: Sujit Baniya <itsursujit at gmail.com>
To: tutor at python.org
Subject: [Tutor] To Find the Answers
Message-ID:
??? <CABwo8Nh423oc=W2=o+ULXuejx0XiZyaWXE1Pj2zYEkGL-PKxrQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

*Write a function named countRepresentations that returns the
number*>* of ways that an amount of money in rupees can be represented
as rupee*>* notes. For this problem we only use? rupee notes in
denominations of*>* 1, 2, 5, 10 and 20 rupee notes.*>**>* The
signature of the function is:*>*? ? def countRepresentations(int
numRupees)*>**>* For example, countRepresentations(12) should return
15 because 12*>* rupees can be represented in the following 15
ways.*>*? 1. 12 one rupee notes*>*? 2. 1 two rupee note plus 10 one
rupee notes*>*? 3. 2 two rupee notes plus 8 one rupee notes*>*? 4. 3
two rupee notes plus 6 one rupee notes*>*? 5. 4 two rupee notes plus
4 one rupee notes*>*? 6. 5 two rupee notes plus 2 one rupee notes*>*
7. 6 two rupee notes*>*? 8. 1 five rupee note plus 7 one rupee
notes*>*? 9. 1 five rupee note, 1 two rupee note and 5 one rupee
notes*>*? 10. 1 five rupee note, 2 two rupee notes and 3 one rupee
notes*>*? 11. 1 five rupee note, 3 two notes and 1 one rupee note*>*
12. 2 five rupee notes and 2 one rupee notes*>*? 13. 2 five rupee
notes and 1 two rupee note*>*? 14. 1 ten rupee note and 2 one rupee
notes*>*? 15. 1 ten rupee note and 1 two rupee note*>**>* Hint: Use a
nested loop that looks like this. Please fill in the*>* blanks
intelligently, i.e. minimize the number of times that the if*>*
statement is executed.*>* for (int rupee20=0; rupee20<=__;
rupee20++)*>*? ? for (int rupee10=0; rupee10<=__; rupee10++)*>*
for (int rupee5=0; rupee5<=__; rupee5++)*>*? ? ? ? ? for (int
rupee2=0; rupee2<=__; rupee2++)*>*? ? ? ? ? ? for (int rupee1=0;
rupee1<=__; rupee1++)*>*? ? ? ? ? ? {*>*? ? ? ? ? ? ? ? if (___)*>*
? ? ? ? ? ? ? ? ? count++*>*? ? ? ? ? ? }*



-- 
Sujit Baniya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121202/ffecad69/attachment-0001.html>

------------------------------

Message: 3
Date: Sun, 02 Dec 2012 00:27:26 -0500
From: Dave Angel <d at davea.name>
To: Sujit Baniya <itsursujit at gmail.com>
Cc: tutor at python.org
Subject: Re: [Tutor] To Find the Answers
Message-ID: <50BAE6BE.4070007 at davea.name>
Content-Type: text/plain; charset=ISO-8859-1

On 12/01/2012 11:39 PM, Sujit Baniya wrote:
> *Write a function named countRepresentations that returns the
> number*>* of ways that an amount of money in rupees can be represented
> as rupee*>* notes. For this problem we only use? rupee notes in
> denominations of*>* 1, 2, 5, 10 and 20 rupee notes.*>**>* The
> signature of the function is:*>*? ? def countRepresentations(int
> numRupees)*>**>* For example, countRepresentations(12) should return
> 15 because 12*>* rupees can be represented in the following 15
> ways.*>*? 1. 12 one rupee notes*>*? 2. 1 two rupee note plus 10 one
> rupee notes*>*? 3. 2 two rupee notes plus 8 one rupee notes*>*? 4. 3
> two rupee notes plus 6 one rupee notes*>*? 5. 4 two rupee notes plus
> 4 one rupee notes*>*? 6. 5 two rupee notes plus 2 one rupee notes*>*
>? 7. 6 two rupee notes*>*? 8. 1 five rupee note plus 7 one rupee
> notes*>*? 9. 1 five rupee note, 1 two rupee note and 5 one rupee
> notes*>*? 10. 1 five rupee note, 2 two rupee notes and 3 one rupee
> notes*>*? 11. 1 five rupee note, 3 two notes and 1 one rupee note*>*
>? 12. 2 five rupee notes and 2 one rupee notes*>*? 13. 2 five rupee
> notes and 1 two rupee note*>*? 14. 1 ten rupee note and 2 one rupee
> notes*>*? 15. 1 ten rupee note and 1 two rupee note*>**>* Hint: Use a
> nested loop that looks like this. Please fill in the*>* blanks
> intelligently, i.e. minimize the number of times that the if*>*
> statement is executed.*>* for (int rupee20=0; rupee20<=__;
> rupee20++)*>*? ? for (int rupee10=0; rupee10<=__; rupee10++)*>*
> for (int rupee5=0; rupee5<=__; rupee5++)*>*? ? ? ? ? for (int
> rupee2=0; rupee2<=__; rupee2++)*>*? ? ? ? ? ? for (int rupee1=0;
> rupee1<=__; rupee1++)*>*? ? ? ? ? ? {*>*? ? ? ? ? ? ? ? if (___)*>*
>? ? ? ? ? ? ? ? ? count++*>*? ? ? ? ? ? }*
>
>

1) Please don't leave html messages here.? Frequently, the formatting is
totally messed up, as you can see here.? This is a text mailing list.

2) If you have a Python question, please ask it.? Posting a query here
about C or C++ doesn't seem to be very effective.



-- 

DaveA



------------------------------

Message: 4
Date: Sun, 2 Dec 2012 18:32:52 +1100
From: Steven D'Aprano <steve at pearwood.info>
To: tutor at python.org
Subject: Re: [Tutor] reverse diagonal
Message-ID: <20121202073252.GA32473 at ando>
Content-Type: text/plain; charset=us-ascii

On Sat, Dec 01, 2012 at 09:19:57PM -0500, eryksun wrote:
> On Sat, Dec 1, 2012 at 11:31 AM, Dave Angel <d at davea.name> wrote:
> >
> > revdiag = [M[i][len(M)-1-i] for i? in range(len(M)) ]
> 
> You might sometimes see this using the bitwise invert operator ~ (i.e.
> __invert__, operator.invert):
> 
>? ? >>> M = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
> 
>? ? >>> [M[i][~i] for i in xrange(len(M))]
>? ? [3, 5, 7]

Ew!

There's something... smelly... about code using a bitwise-invert for 
indexing. Using it for bitwise operations is one thing. Using it to save 
writing two characters is just nasty.

http://www.joelonsoftware.com/articles/Wrong.html
http://c2.com/cgi/wiki?CodeSmell

It smacks of low-level optimization tricks which make no sense in a 
high-level language like Python. You aren't writing optimized assembler, 
if you want -i-1 write -i-1 !

> ~i returns the value (-i - 1):

Assuming certain implementation details about how integers are stored, 
namely that they are two-compliment rather than one-compliment or 
something more exotic.

Okay, just about every computer made since 1960 uses two-compliment 
integers, but still, the effect of ~i depends on the way integers are 
represented internally rather than some property of integers as an 
abstract number. That makes it a code smell.

And there is the risk that ~i will be misread as -i, which would be bad.



-- 
Steven


------------------------------

Message: 5
Date: Sun, 2 Dec 2012 16:53:43 +0800 (SGT)
From: Spectral None <spectralnone at yahoo.com.sg>
To: "tutor at python.org" <tutor at python.org>
Subject: [Tutor] 1 to N searches in files
Message-ID:
??? <1354438423.33849.YahooMailNeo at web190604.mail.sg3.yahoo.com>
Content-Type: text/plain; charset="utf-8"

Hi all

I have two files (File A and File B) with strings of data in them (each string on a separate line). Basically, each string in File B will be compared with all the strings in File A and the resulting output is to show a list of matched/unmatched lines and optionally to write to a third File C

File A: Unique strings
File B: Can have duplicate strings (that is, "string1" may appear more than once)

My code currently looks like this:

-----------------
FirstFile = open('C:\FileA.txt', 'r')
SecondFile = open('C:\FileB.txt', 'r')
ThirdFile = open('C:\FileC.txt', 'w')

a = FirstFile.readlines()
b = SecondFile.readlines()

mydiff = difflib.Differ()
results = mydiff(a,b)
print("\n".join(results))

#ThirdFile.writelines(results)

FirstFile.close()
SecondFile.close()
ThirdFile.close()
---------------------

However, it seems that the results do not correctly reflect the matched/unmatched lines. As an example, if FileA contains "string1" and FileB contains multiple occurrences of "string1", it seems that the first occurrence matches correctly but subsequent "string1"s are treated as unmatched strings.

I am thinking perhaps I don't understand Differ() that well and that it is not doing what I hoped to do? Is Differ() comparing first line to first line and second line to second line etc in contrast to what I wanted to do?

Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121202/6507cbad/attachment-0001.html>

------------------------------

Message: 6
Date: Sun, 02 Dec 2012 20:34:24 +1100
From: Steven D'Aprano <steve at pearwood.info>
To: tutor at python.org
Subject: Re: [Tutor] 1 to N searches in files
Message-ID: <50BB20A0.5090301 at pearwood.info>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 02/12/12 19:53, Spectral None wrote:

> However, it seems that the results do not correctly reflect the
>matched/unmatched lines. As an example, if FileA contains "string1"
> and FileB contains multiple occurrences of "string1", it seems that
>? the first occurrence matches correctly but subsequent "string1"s
>are treated as unmatched strings.
>
> I am thinking perhaps I don't understand Differ() that well and that
>? it is not doing what I hoped to do? Is Differ() comparing first line
>? to first line and second line to second line etc in contrast to what
>? I wanted to do?

No, and yes.

No, it is not comparing first line to first line.

And yes, it is acting in contrast to what you hope to do, otherwise you
wouldn't be asking the question :-)

Unfortunately, you don't explain what it is that you hope to do, so I'm
going to have to guess. See below.

difflib is used for find differences between two files. It will try to
find a set of changes which will turn file A into file B, e.g:

insert this line here
delete this line there
...


and repeated as many times as needed. Except that difflib.Differ uses
a shorthand of "+" and "-" to indicate adding and deleting lines.

You can find out more about difflib and Differ objects by reading the
Fine Manual. Open a Python interactive shell, and do this:

import difflib
help(difflib.Differ)


If you have any questions, please feel free to ask.

In the code sample you give, you say you do this:

mydiff = difflib.Differ()
results = mydiff(a,b)

but that doesn't work, Differ objects are not callable. Please do not
paraphrase your code. Copy and paste the exact code you have actually
run, don't try to type it out from memory.

Now, I *guess* that what you are trying to do is something like this...
given files A and B:


# file A
spam
ham
eggs
tomato


# file B
tomato
spam
eggs
cheese
spam
spam


you want to generate three lists:

# lines in B that were also in A:
tomato
spam
eggs


# lines in B that were not in A:
cheese


# lines in A that were not found in B:
ham


Am I close?

If not, please explain with an example what you are trying
to do.


-- 
Steven


------------------------------

> Hi Steven

> I was searching for strings comparison and saw this article and decided to try it. There was no error when I ran the code
> (http://stackoverflow.com/questions/11008519/detecting-and-printing-the-difference-between-two-text-files-using-python-3-2)

> In another reply by Dave about matching list of valid words, that is similar to what I want to do. I guess I probably misunderstood the usage of Differ(). Thanks for the help!

> Regards

Hi Steven

My apologies as well for not being clear in my explanation.

Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121203/2517abf1/attachment-0001.html>


More information about the Tutor mailing list