[Tutor] Quick question regarding Parsing a Delimited string
Rich Lovely
roadierich at googlemail.com
Wed Jul 8 19:22:02 CEST 2009
On 8 Jul 2009, at 17:13, Garry Bettle <garry.bettle at gmail.com> wrote:
> Hi,
>
> I've been programming for over 20 yrs, but only the last few in python
> and then only in dribs and drabs.
>
> I'm having a difficult time parsing a delimited string.
>
> e.g.
>
> 100657641~GBP~ACTIVE~0~1~~true~5.0~1247065352508~:
> 3818854~0~24104.08~4.5~~22.1~false|
> 4.4~241.67~L~1~4.3~936.0~L~2~4.2~210.54~L~3~|
> 4.5~19.16~B~1~4.6~214.27~B~2~4.7~802.13~B~3~:
> 3991404~1~19974.18~4.7~~21.7~false|
> 4.6~133.01~L~1~4.5~124.83~L~2~4.4~319.33~L~3~|
> 4.7~86.61~B~1~4.8~247.9~B~2~4.9~142.0~B~3~:
> 4031423~2~15503.56~6.6~~15.1~false|
> 6.6~53.21~L~1~6.4~19.23~L~2~6.2~53.28~L~3~|
> 6.8~41.23~B~1~7.0~145.04~B~2~7.2~37.23~B~3~
>
> That is just a selection of the full string - and I've broken it up
> for this email. It's delimited by : and then by ~ and finally, in
> some cases, | (a pipe).
>
> If the string is called m, I thought I could create a list with
> m.split(":"). I would like to then first of all find in this list the
> entry beginning with e.g. 3991404.
>
> I thought I could pop each item in the list and compare that seems
> pretty long winded.
>
> When the ItemFound is now =
> '3991404~1~19974.18~4.7~~21.7~false|
> 4.6~133.01~L~1~4.5~124.83~L~2~4.4~319.33~L~3~|
> 4.7~86.61~B~1~4.8~247.9~B~2~4.9~142.0~B~3~:'
>
> I would like to return the 3rd item delimited with ~, which in this
> case, is 4.7
>
> Can anyone help?
>
> Many thanks!
>
> Cheers,
>
> Garry
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
I've been dealing with a similar problem myself, parsing input for
project Euler. The way I did it was to map a split function onto the
first list:
lst = map(lambda s: s.split("~"), m.split(":"))
You can get the same effect with a comprehension:
lst = [s.split("~") for s in m.split(":")]
You can then use a function like the following:
def find(term):
for i in lst:
if i[0] == term:
return i[3]
Of course, this assumes that you only want the first match, but it
would be trivial to modify it to return all matches.
Does that help? If it doesn't solve the problem, I hope it will at
least point you towards how to solve it.
If you really want to speed up the search, you could turn the list of
lists into a dict, using the first value in each sublist as a key:
dct = dict((i[0], i[1:]) for i in lst)
Then you can access it using the normal dictionary interface.
dct["3991404"][3]
This will only return the last of any repeated values (previous ones
will get overwritten during construction), so it really depends on the
behaviour you want.
---
Richard "Roadie Rich" Lovely
Part of the JNP|UK Famille
www.theJNP.com
(Sent from my iPod - please allow me a few typos: it's a very small
keyboard)
More information about the Tutor
mailing list