[Tutor] regular expressions question]

Sat Aug 12 16:30:34 CEST 2006

Hi Alan and other Gurus,

if you look carefully at the string below, you see
that in amongst the "\x" stuff you have the text I want:
z tfile://home/alpha
which I know to be an address on my system, plus a bit of preceeding txt.
Alan Gauld wrote:
>> The file's encoding is binary or something
>>
>> Here is the first section of the file:
>> '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l' 
>>
>>
>> Does that tell you anything?
> But that is almost certainly the wrong approach, you'll never
> figure out where the word boundaries are without them!
So I believe this is the right approach. in fact, If I print the string, 
without any modifications:
I get the following sort of stuff:
¸z¨ôôtfile:/home/alpha/care/my_details.aspx.htmlÿÿÿÿÿÿÿÿ%oô¯0%oô¯0l

So this is one approach that will work.
I have no idea what sort of encoding it is, but if someone could tell me 
how to get rid of what I assume are hex digits.
In a hex editor it turns out to be readable and sensible url's with 
spaces between each digit, and a bit of crud at the end of url's, just 
as above.

Any suggestions with that additional info?
I've used struct before, it is a very nice module. Could  this be some 
sort of UTF encoding?

I think I was a bit light on info with that first post.
Thanks for your time,

Matt