On 30 September 2012 23:08, Arnaud Delobelle <span dir="ltr"><<a href="mailto:arnodel@gmail.com" target="_blank">arnodel@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On 30 September 2012 02:27, Kevin Anthony <<a href="mailto:kevin.s.anthony@gmail.com">kevin.s.anthony@gmail.com</a>> wrote:<br>
> I have a list of filenames, and i need to find files with the same name,<br>
> different extensions, and split that into tuples. does anyone have any<br>
> suggestions on an easy way to do this that isn't O(n^2)?<br>
<br>
</div>>>> import os, itertools<br>
>>> filenames = ["foo.png", "bar.csv", "foo.html", "bar.py"]<br>
>>> dict((key, tuple(val)) for key, val in itertools.groupby(sorted(filenames), lambda f: os.path.splitext(f)[0]))<br>
{'foo': ('foo.html', 'foo.png'), 'bar': ('bar.csv', 'bar.py')}<br></blockquote><div><br></div><div>That seems wasteful. Sort is O(n log n)</div><div><br></div><div>I've seen this pattern a lot. Surely there should be an object for this...</div>
<div><br></div><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><font face="courier new, monospace">filenames = ["foo.png", "bar.csv", "foo.html", "bar.py"] <br>
</font><font face="courier new, monospace"><br></font><font face="courier new, monospace">import os<br></font><font face="courier new, monospace"><br></font><font face="courier new, monospace">from collections import defaultdict<br>
</font><font face="courier new, monospace">grouped = defaultdict(list) <br></font><font face="courier new, monospace"><br></font><font face="courier new, monospace">for file in filenames:<br></font><font face="courier new, monospace"> splitname = os.path.splitext(file)<br>
</font><font face="courier new, monospace"> grouped[splitname[0]].append(splitname[1]) <br></font><font face="courier new, monospace"> <br></font><font face="courier new, monospace">grouped<br></font><font face="courier new, monospace">>>> defaultdict(<class 'list'>, {'foo': ['.png', '.html'], 'bar': ['.csv', '.py']})</font></blockquote>
</div><div><br></div><div>This should be near-enough O(n) time. Pah, it's not like you need to optimize this anyway!</div></div>