One thought is a default dictionary<div><br></div><div>import containers</div><div><br></div><div>counts = containers.defaultdict(int)</div><div><br></div><div>for pair in my_array:</div><div> counts[pair] += 1</div><div>
<br></div><div>duplicated_pairs = [x for x in counts if counts[x] > 1]</div><div><br></div><div>Gerry<br><br><div class="gmail_quote">On Wed, Dec 22, 2010 at 12:21 PM, <span dir="ltr"><<a href="mailto:python-win32-request@python.org">python-win32-request@python.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Send python-win32 mailing list submissions to<br>
<a href="mailto:python-win32@python.org">python-win32@python.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="http://mail.python.org/mailman/listinfo/python-win32" target="_blank">http://mail.python.org/mailman/listinfo/python-win32</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:python-win32-request@python.org">python-win32-request@python.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:python-win32-owner@python.org">python-win32-owner@python.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of python-win32 digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Identify unique data from sequence array (otrov)<br>
2. Re: Intenet explorer using PythonWin Help (Mike Driscoll)<br>
3. Re: Identify unique data from sequence array (Aahz)<br>
4. Re: Identify unique data from sequence array (Mike Diehn)<br>
5. Re: Identify unique data from sequence array (otrov)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Wed, 22 Dec 2010 13:11:43 +0100<br>
From: otrov <<a href="http://dejan.org" target="_blank">dejan.org</a>@<a href="http://gmail.com" target="_blank">gmail.com</a>><br>
To: <a href="mailto:python-win32@python.org">python-win32@python.org</a><br>
Subject: [python-win32] Identify unique data from sequence array<br>
Message-ID: <<a href="mailto:1877904314.20101222131143@gmail.com">1877904314.20101222131143@gmail.com</a>><br>
Content-Type: text/plain; charset=us-ascii<br>
<br>
Hi,<br>
I failed in my first idea to solve this problem with matlab/octave, as I just started using this tools for data manipulation, and then thought to try python as more feature rich descriptive language and post this problem to python group I'm subscribed already<br>
<br>
Let's consider this simple dictionary object (scipy array):<br>
<br>
X = array([[1, 2],<br>
[1, 2],<br>
[2, 2],<br>
[3, 1],<br>
[2, 3],<br>
[1, 2],<br>
[1, 2],<br>
[2, 2],<br>
[3, 1],<br>
[2, 3],<br>
[1, 2],<br>
[1, 2],<br>
[2, 2],<br>
[3, 1],<br>
[2, 3],<br>
...,<br>
[1, 2],<br>
[1, 2],<br>
[2, 2],<br>
[3, 1],<br>
[2, 3]]<br>
<br>
I would like to extract repeated sequence data:<br>
<br>
Y = array([[1, 2],<br>
[1, 2],<br>
[2, 2],<br>
[3, 1],<br>
[2, 3]]<br>
<br>
as a result.<br>
<br>
Arrays are consisted of 10^7 to 10^8 elements, and unique sequence consists of maximum 10^6 elements, usually less like 10^5<br>
<br>
Thanks for your time<br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Wed, 22 Dec 2010 08:31:27 -0600<br>
From: Mike Driscoll <<a href="mailto:mdriscoll@co.marshall.ia.us">mdriscoll@co.marshall.ia.us</a>><br>
Cc: <a href="mailto:python-win32@python.org">python-win32@python.org</a><br>
Subject: Re: [python-win32] Intenet explorer using PythonWin Help<br>
Message-ID: <<a href="mailto:4D120BBF.3040108@co.marshall.ia.us">4D120BBF.3040108@co.marshall.ia.us</a>><br>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"<br>
<br>
On 1:59 PM, Pat McGuire wrote:<br>
><br>
> I am new at programming with Python and am using Pythonwin. I have a<br>
> couple of questions'<br>
><br>
> 1. The code below after doc.FormName.submit() will navigate to<br>
> the correct page but if I print the url it shows the url of the page I<br>
> logged in at. I thought submit would be just like if I clicked on the<br>
> submit button.<br>
><br>
> import win32com.client<br>
> import win32api<br>
> ie = win32com.client.Dispatch( "InternetExplorer.Application" )<br>
> ie.Visible = 1<br>
> ie.Navigate("urlhere<br>
> <<a href="http://posting.www.backpage.com/classifieds/central/index" target="_blank">http://posting.www.backpage.com/classifieds/central/index</a>>")<br>
> while ie.Busy == True:<br>
> win32api.Sleep(1000)<br>
> doc = ie.Document<br>
> doc.FormName.email.value = "emailaddress <mailto:<a href="mailto:doublepllc@gmail.com">doublepllc@gmail.com</a>>"<br>
> doc.FormName.password.value = "mypassword"<br>
> doc.FormName.submit()<br>
><br>
> 2. Can you point me to a site that which show me how to access each<br>
> type of form element, i.e. option, hrefs, links, etc<br>
><br>
><br>
> Any help is greatly appreciated.<br>
><br>
<br>
I've heard good things about Mechanize:<br>
<a href="http://mechanize.rubyforge.org/mechanize/" target="_blank">http://mechanize.rubyforge.org/mechanize/</a><br>
<br>
It's not PyWin32, but it's probably easier to use than win32com methods.<br>
<br>
<br>
<br>
--<br>
Mike Driscoll<br>
<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mail.python.org/pipermail/python-win32/attachments/20101222/f201837c/attachment-0001.html" target="_blank">http://mail.python.org/pipermail/python-win32/attachments/20101222/f201837c/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Wed, 22 Dec 2010 07:28:25 -0800<br>
From: Aahz <<a href="mailto:aahz@pythoncraft.com">aahz@pythoncraft.com</a>><br>
To: <a href="mailto:python-win32@python.org">python-win32@python.org</a><br>
Subject: Re: [python-win32] Identify unique data from sequence array<br>
Message-ID: <<a href="mailto:20101222152825.GB3725@panix.com">20101222152825.GB3725@panix.com</a>><br>
Content-Type: text/plain; charset=us-ascii<br>
<br>
On Wed, Dec 22, 2010, otrov wrote:<br>
><br>
> I failed in my first idea to solve this problem with matlab/octave,<br>
> as I just started using this tools for data manipulation, and then<br>
> thought to try python as more feature rich descriptive language and<br>
> post this problem to python group I'm subscribed already<br>
<br>
You may get better answers posting to a general Python group (e.g.<br>
comp.lang.python).<br>
--<br>
Aahz (<a href="mailto:aahz@pythoncraft.com">aahz@pythoncraft.com</a>) <*> <a href="http://www.pythoncraft.com/" target="_blank">http://www.pythoncraft.com/</a><br>
<br>
"Think of it as evolution in action." --Tony Rand<br>
<br>
<br>
------------------------------<br>
<br>
Message: 4<br>
Date: Wed, 22 Dec 2010 11:01:52 -0500<br>
From: Mike Diehn <<a href="mailto:mike.diehn@ansys.com">mike.diehn@ansys.com</a>><br>
To: Aahz <<a href="mailto:aahz@pythoncraft.com">aahz@pythoncraft.com</a>><br>
Cc: <a href="mailto:python-win32@python.org">python-win32@python.org</a><br>
Subject: Re: [python-win32] Identify unique data from sequence array<br>
Message-ID:<br>
<AANLkTi=9RVD+gcO2jR3t_YSwMwuXoVgwqRHupY4HRk=<a href="mailto:9@mail.gmail.com">9@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
I'm a unix guy. That's what we call a sort-uniq operation, after the<br>
pipeline we'd use: sort datafile | uniq > uniq-lines.txt. So I google that<br>
with python and ....<br>
<br>
As Jason Petrone wrote when he withdrew PEP 270 in<br>
<a href="http://www.python.org/dev/peps/pep-0270/" target="_blank">http://www.python.org/dev/peps/pep-0270/</a>:<br>
<br>
<br>
"creating a sequence without duplicates is just a matter of<br>
choosing a different data structure: a set instead of a list."<br>
<br>
<br>
At the time, sets.py was a nifty new thing. Since then, the set datatype<br>
has<br>
been added to python's base.<br>
<br>
set() can consume a list of tuples, but not a list of lists, like the X you<br>
showed us. You're job will be getting your massive list of lists into a<br>
list of tuples.<br>
<br>
This works, but for your very large arrays, may take large time:<br>
<br>
X = [[1,2], [1,2], [3,4], [3,4]]<br>
<br>
Y = set( [tuple(x) for x in X] )<br>
<br>
<br>
There may be faster methods. The map() function might help, but I really<br>
don't know. Here's something to try:<br>
<br>
Y = set( map(tuple, X )<br>
<br>
<br>
Or you can go old school route, from before the days of set(), that is:<br>
<br>
<a href="http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/" target="_blank">http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/</a><br>
<br>
<br>
Best,<br>
Mike<br>
<br>
On Wed, Dec 22, 2010 at 10:28 AM, Aahz <<a href="mailto:aahz@pythoncraft.com">aahz@pythoncraft.com</a>> wrote:<br>
<br>
> On Wed, Dec 22, 2010, otrov wrote:<br>
> ><br>
> > I failed in my first idea to solve this problem with matlab/octave,<br>
> > as I just started using this tools for data manipulation, and then<br>
> > thought to try python as more feature rich descriptive language and<br>
> > post this problem to python group I'm subscribed already<br>
><br>
> You may get better answers posting to a general Python group (e.g.<br>
> comp.lang.python).<br>
> --<br>
> Aahz (<a href="mailto:aahz@pythoncraft.com">aahz@pythoncraft.com</a>) <*><br>
> <a href="http://www.pythoncraft.com/" target="_blank">http://www.pythoncraft.com/</a><br>
><br>
> "Think of it as evolution in action." --Tony Rand<br>
> _______________________________________________<br>
> python-win32 mailing list<br>
> <a href="mailto:python-win32@python.org">python-win32@python.org</a><br>
> <a href="http://mail.python.org/mailman/listinfo/python-win32" target="_blank">http://mail.python.org/mailman/listinfo/python-win32</a><br>
><br>
<br>
<br>
<br>
--<br>
Mike Diehn<br>
Senior Systems Administrator<br>
ANSYS, Inc - Lebanon, NH Office<br>
<a href="mailto:mike.diehn@ansys.com">mike.diehn@ansys.com</a>, (603) 727-5492<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mail.python.org/pipermail/python-win32/attachments/20101222/fb64dcbf/attachment-0001.html" target="_blank">http://mail.python.org/pipermail/python-win32/attachments/20101222/fb64dcbf/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 5<br>
Date: Wed, 22 Dec 2010 18:21:27 +0100<br>
From: otrov <<a href="http://dejan.org" target="_blank">dejan.org</a>@<a href="http://gmail.com" target="_blank">gmail.com</a>><br>
To: <a href="mailto:python-win32@python.org">python-win32@python.org</a><br>
Subject: Re: [python-win32] Identify unique data from sequence array<br>
Message-ID: <<a href="mailto:18010463321.20101222182127@gmail.com">18010463321.20101222182127@gmail.com</a>><br>
Content-Type: text/plain; charset=us-ascii<br>
<br>
> I'm a unix guy. That's what we call a sort-uniq operation, after the<br>
> pipeline we'd use: sort datafile | uniq > uniq-lines.txt. So I google that<br>
> with python and ....<br>
<br>
> As Jason Petrone wrote when he withdrew PEP 270 in<br>
> <a href="http://www.python.org/dev/peps/pep-0270/" target="_blank">http://www.python.org/dev/peps/pep-0270/</a>:<br>
<br>
<br>
> "creating a sequence without duplicates is just a matter of<br>
> choosing a different data structure: a set instead of a list."<br>
<br>
<br>
> At the time, sets.py was a nifty new thing. Since then, the set datatype<br>
> has<br>
> been added to python's base.<br>
<br>
> set() can consume a list of tuples, but not a list of lists, like the X you<br>
> showed us. You're job will be getting your massive list of lists into a<br>
> list of tuples.<br>
<br>
> This works, but for your very large arrays, may take large time:<br>
<br>
> X = [[1,2], [1,2], [3,4], [3,4]]<br>
<br>
> Y = set( [tuple(x) for x in X] )<br>
<br>
<br>
> There may be faster methods. The map() function might help, but I really<br>
> don't know. Here's something to try:<br>
<br>
> Y = set( map(tuple, X )<br>
<br>
<br>
> Or you can go old school route, from before the days of set(), that is:<br>
<br>
> <a href="http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/" target="_blank">http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/</a><br>
<br>
<br>
> Best,<br>
> Mike<br>
<br>
Thanks for your reply, but perhaps there is misunderstanding:<br>
<br>
I don't want unique values, but unique sequence (block) of data that is repeated in array:<br>
<br>
A B C D D D A B C D D D A B C D D D<br>
|_________| |_________| |_________|<br>
| | |<br>
unique unique unique<br>
sequence sequence sequence<br>
data data data<br>
<br>
I tested your approach and won't say it's slow. It works great but that's not what I'm after. Thanks anyway<br>
<br>
Cheers<br>
<br>
<br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
python-win32 mailing list<br>
<a href="mailto:python-win32@python.org">python-win32@python.org</a><br>
<a href="http://mail.python.org/mailman/listinfo/python-win32" target="_blank">http://mail.python.org/mailman/listinfo/python-win32</a><br>
<br>
<br>
End of python-win32 Digest, Vol 93, Issue 26<br>
********************************************<br>
</blockquote></div><br></div>