Creating a dictionary from a .txt file
Roy Smith
roy at panix.com
Sun Mar 31 14:41:41 EDT 2013
In article <mailman.4023.1364751102.2939.python-list at python.org>,
Dave Angel <davea at davea.name> wrote:
> On 03/31/2013 12:52 PM, C.T. wrote:
> > On Sunday, March 31, 2013 12:20:25 PM UTC-4, zipher wrote:
> >> <SNIP>
> >>
> >
> > Thank you, Mark! My problem is the data isn't consistently ordered. I can
> > use slicing and indexing to put the year into a tuple, but because a car
> > manufacturer could have two names (ie, Aston Martin) or a car model could
> > have two names(ie, Iron Duke), its harder to use slicing and indexing for
> > those two. I've added the following, but the output is still not what I
> > need it to be.
>
> So the correct answer is "it cannot be done," and an explanation.
>
> Many times I've been given impossible conditions for a problem. And
> invariably the correct solution is to press [back] on the supplier of the
> constraints.
In real life, you often have to deal with crappy input data (and bogus
project requirements). Sometimes you just need to be creative.
There's only a small set of car manufacturers. A good start would be
mining wikipedia's [[List of automobile manufacturers]]. Once you've
got that list, you could try matching portions of the input against the
list.
Depending on how much effort you wanted to put into this, you could
explore all sorts of fuzzy matching (ie "delorean" vs "delorean motor
company"), but even a simple search is better than giving up.
And, this is a good excuse to explore some of the interesting
third-party modules. For example, mwclient ("pip install mwclient")
gives you a neat Python interface to wikipedia. And there's a whole
landscape of string matching packages to explore.
We deal with this every day at Songza. Are Kesha and Ke$ha the same
artist? Pushing back on the record labels to clean up their catalogs
isn't going to get us very far.
More information about the Python-list
mailing list