Generating list of unique search sub-phrases
Nick Mellor
thebalancepro at gmail.com
Wed Jun 17 18:55:35 EDT 2015
On Saturday, 30 May 2015 06:39:44 UTC+10, Nick Mellor wrote:
> Hi all,
>
> My own solution works but I'm sure it could be simpler or read better. How would you do it?
>
> Say you've got a list of companies:
>
> Aerosonde Ltd
> Amcor
> ANCA
> Austal Ships
> Australia Post
> Australian Air Express
> Australian Defence Industries
> Australian Railroad Group
> Australian Submarine Corporation
>
> and you need to extract phrases from the company names that uniquely identify that company. The results for the above list of companies should be:
>
> Company: 'Aerosonde Ltd'
> Aliases: Aerosonde,Ltd,Aerosonde Ltd
>
> Company: 'Amcor'
> Aliases: Amcor
>
> Company: 'ANCA'
> Aliases: ANCA
>
> Company: 'Austal Ships'
> Aliases: Austal,Ships,Austal Ships
>
> Company: 'Australia Post'
> Aliases: Post,Australia Post
>
> Company: 'Australian Air Express'
> Aliases: Air,Express,Australian Air,Air Express,Australian Air Express
>
> Company: 'Australian Defence Industries'
> Aliases: Defence,Industries,Australian Defence,Defence Industries,Australian Defence Industries
>
> Company: 'Australian Railroad Group'
> Aliases: Railroad,Group,Australian Railroad,Railroad Group,Australian Railroad Group
>
> Company: 'Australian Submarine Corporation'
> Aliases: Submarine,Corporation,Australian Submarine,Submarine Corporation,Australian Submarine Corporation
>
> Here's my solution:
>
> from itertools import combinations, chain
>
> companies = [
> "Aerosonde Ltd",
> "Amcor",
> "ANCA",
> "Austal Ships",
> "Australia Post",
> "Australian Air Express",
> "Australian Defence Industries",
> "Australian Railroad Group",
> "Australian Submarine Corporation",
> ]
>
> def flatten(i):
> return list(chain.from_iterable(i))
>
> companies_as_text_stream = ' '.join(companies)
> for company in companies:
> word_combinations = [list(combinations(company.split(), r)) for r in range(1, len(company))]
> phrases = [' '.join(phrase) for phrase in flatten(word_combinations)]
> unique_phrases = [phrase for phrase in phrases if companies_as_text_stream.count(phrase) == 1]
> aliases = ','.join(unique_phrases)
> print("Company: '{0}'\n Aliases: {1}\n".format(company, aliases))
Great reply, Peter, thank you. Lots to think about.
Cheers,
Nick
More information about the Python-list
mailing list