toy list processing problem: collect similar terms

sln at netherlands.com sln at netherlands.com
Thu Oct 7 20:44:31 EDT 2010


On Wed, 06 Oct 2010 10:52:19 -0700, sln at netherlands.com wrote:

>On Sat, 25 Sep 2010 21:05:13 -0700 (PDT), Xah Lee <xahlee at gmail.com> wrote:
>
>>here's a interesting toy list processing problem.
>>
>>I have a list of lists, where each sublist is labelled by
>>a number. I need to collect together the contents of all sublists
>>sharing
>>the same label. So if I have the list
>>
>>((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
>>r) (5 s t))
>>
>>where the first element of each sublist is the label, I need to
>>produce:
>>
>>output:
>>((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>>
>[snip]
>
>>anyone care to give a solution in Python, Perl, javascript, or other
>>lang? am guessing the scheme solution can be much improved... perhaps
>>using some lib but that seems to show scheme is pretty weak if the lib
>>is non-standard.
>>
>
>Crossposting to Lisp, Python and Perl because the weird list of lists looks
>like Lisp or something else, and you mention other languages so I'm throwing
>this out for Perl.
>
>It appears this string you have there is actually list syntax in another language.
>If it is, its the job of the language to parse the data out. Why then do you
>want to put it into another language form? At runtime, once the data is in variables,
>dictated by the syntax, you can do whatever data manipulation you want
>(combining arrays, etc..).
>
>So, in the spirit of a preprocessor, given that the text is balanced, with proper closure,
>ie:   ( (data) (data) )    is ok.
>      ( data (data) )      is not ok.
>
>the below does simple text manipulation, joining like labeled sublists, without going into
>the runtime guts of internalizing the data itself. Internally, this is too simple.
>

If not preprocessor, then ...
The too simple, order independent, id independent, Perl approach.

-sln
-----------------

use strict;
use warnings;
use Data::Dump 'dump';

my @inp = ([0,'a','b'],[1,'c','d'],[2,'e','f'],[3,'g','h'],
           [1,'i','j'],[2,'k','l'],[4,'m','n'],[2,'o','p'],
           [4,'q','r'],[5,'s','t']);

my ($cnt, @outp, %hs) = (0);

for my $ref (@inp) {
   $hs{ $$ref[0] } or $hs{ $$ref[0] } = $cnt++;
   push @{$outp[ $hs{ $$ref[0] } ] }, @{$ref}[ 1 .. $#{$ref} ];
}

dump @outp;

__END__

(
  ["a", "b"],
  ["c", "d", "i", "j"],
  ["e", "f", "k", "l", "o", "p"],
  ["g", "h"],
  ["m", "n", "q", "r"],
  ["s", "t"],
)




More information about the Python-list mailing list