re beginner
John Machin
sjmachin at lexicon.net
Sun Jun 4 19:01:46 EDT 2006
On 5/06/2006 10:38 AM, Bruno Desthuilliers wrote:
> SuperHik a écrit :
>> hi all,
>>
>> I'm trying to understand regex for the first time, and it would be
>> very helpful to get an example. I have an old(er) script with the
>> following task - takes a string I copy-pasted and wich always has the
>> same format:
>>
>> >>> print stuff
>> Yellow hat 2 Blue shirt 1
>> White socks 4 Green pants 1
>> Blue bag 4 Nice perfume 3
>> Wrist watch 7 Mobile phone 4
>> Wireless cord! 2 Building tools 3
>> One for the money 7 Two for the show 4
>>
>> >>> stuff
>> 'Yellow hat\t2\tBlue shirt\t1\nWhite socks\t4\tGreen pants\t1\nBlue
>> bag\t4\tNice perfume\t3\nWrist watch\t7\tMobile phone\t4\nWireless
>> cord!\t2\tBuilding tools\t3\nOne for the money\t7\tTwo for the show\t4'
>>
>> I want to put items from stuff into a dict like this:
>> >>> print mydict
>> {'Wireless cord!': 2, 'Green pants': 1, 'Blue shirt': 1, 'White
>> socks': 4, 'Mobile phone': 4, 'Two for the show': 4, 'One for the
>> money': 7, 'Blue bag': 4, 'Wrist watch': 7, 'Nice perfume': 3, 'Yellow
>> hat': 2, 'Building tools': 3}
>>
>> Here's how I did it:
>> >>> def putindict(items):
>> ... items = items.replace('\n', '\t')
>> ... items = items.split('\t')
>> ... d = {}
>> ... for x in xrange( len(items) ):
>> ... if not items[x].isdigit(): d[items[x]] = int(items[x+1])
>> ... return d
>> >>>
>> >>> mydict = putindict(stuff)
>>
>>
>> I was wondering is there a better way to do it using re module?
>> perheps even avoiding this for loop?
>
> There are better ways. One of them avoids the for loop, and even the re
> module:
>
> def to_dict(items):
> items = items.replace('\t', '\n').split('\n')
In case there are leading/trailing spaces on the keys:
items = [x.strip() for x in items.replace('\t', '\n').split('\n')]
> return dict(zip(items[::2], map(int, items[1::2])))
>
> HTH
Fantastic -- at least for the OP's carefully copied-and-pasted input.
Meanwhile back in the real world, there might be problems with multiple
tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc.
In that case a loop approach that validated as it went and was able to
report the position and contents of any invalid input might be better.
More information about the Python-list
mailing list