[Tutor] Quick question regarding Parsing a Delimited string

Rich Lovely roadierich at googlemail.com
Wed Jul 8 19:22:02 CEST 2009


On 8 Jul 2009, at 17:13, Garry Bettle <garry.bettle at gmail.com> wrote:

> Hi,
>
> I've been programming for over 20 yrs, but only the last few in python
> and then only in dribs and drabs.
>
> I'm having a difficult time parsing a delimited string.
>
> e.g.
>
> 100657641~GBP~ACTIVE~0~1~~true~5.0~1247065352508~:
> 3818854~0~24104.08~4.5~~22.1~false| 
> 4.4~241.67~L~1~4.3~936.0~L~2~4.2~210.54~L~3~| 
> 4.5~19.16~B~1~4.6~214.27~B~2~4.7~802.13~B~3~:
> 3991404~1~19974.18~4.7~~21.7~false| 
> 4.6~133.01~L~1~4.5~124.83~L~2~4.4~319.33~L~3~| 
> 4.7~86.61~B~1~4.8~247.9~B~2~4.9~142.0~B~3~:
> 4031423~2~15503.56~6.6~~15.1~false| 
> 6.6~53.21~L~1~6.4~19.23~L~2~6.2~53.28~L~3~| 
> 6.8~41.23~B~1~7.0~145.04~B~2~7.2~37.23~B~3~
>
> That is just a selection of the full string - and I've broken it up
> for this email.  It's delimited by : and then by ~ and finally, in
> some cases, | (a pipe).
>
> If the string is called m, I thought I could create a list with
> m.split(":").  I would like to then first of all find in this list the
> entry beginning with e.g. 3991404.
>
> I thought I could pop each item in the list and compare that seems
> pretty long winded.
>
> When the ItemFound is now =
> '3991404~1~19974.18~4.7~~21.7~false| 
> 4.6~133.01~L~1~4.5~124.83~L~2~4.4~319.33~L~3~| 
> 4.7~86.61~B~1~4.8~247.9~B~2~4.9~142.0~B~3~:'
>
> I would like to return the 3rd item delimited with ~, which in this  
> case, is 4.7
>
> Can anyone help?
>
> Many thanks!
>
> Cheers,
>
> Garry
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
I've been dealing with a similar problem myself, parsing input for  
project Euler. The way I did it was to map a split function onto the  
first list:

lst = map(lambda s: s.split("~"), m.split(":"))
You can get the same effect with a comprehension:

lst = [s.split("~") for s in m.split(":")]

You can then use a function like the following:

def find(term):
     for i in lst:
         if i[0] == term:
             return i[3]

Of course, this assumes that you only want the first match, but it  
would be trivial to modify it to return all matches.

Does that help? If it doesn't solve the problem, I hope it will at  
least point you towards how to solve it.

If you really want to speed up the search, you could turn the list of  
lists into a dict, using the first value in each sublist as a key:

dct = dict((i[0], i[1:]) for i in lst)

Then you can access it using the normal dictionary interface.
dct["3991404"][3]

This will only return the last of any repeated values (previous ones  
will get overwritten during construction), so it really depends on the  
behaviour you want.
---
Richard "Roadie Rich" Lovely
Part of the JNP|UK Famille
www.theJNP.com

(Sent from my iPod - please allow me a few typos: it's a very small  
keyboard)


More information about the Tutor mailing list