[Tutor] regular expressions question]
nimrodx
nimrodx at slingshot.co.nz
Sat Aug 12 16:30:34 CEST 2006
Hi Alan and other Gurus,
if you look carefully at the string below, you see
that in amongst the "\x" stuff you have the text I want:
z tfile://home/alpha
which I know to be an address on my system, plus a bit of preceeding txt.
Alan Gauld wrote:
>> The file's encoding is binary or something
>>
>> Here is the first section of the file:
>> '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l'
>>
>>
>> Does that tell you anything?
> But that is almost certainly the wrong approach, you'll never
> figure out where the word boundaries are without them!
So I believe this is the right approach. in fact, If I print the string,
without any modifications:
I get the following sort of stuff:
¸z¨ôôtfile:/home/alpha/care/my_details.aspx.htmlÿÿÿÿÿÿÿÿ%oô¯0%oô¯0l
So this is one approach that will work.
I have no idea what sort of encoding it is, but if someone could tell me
how to get rid of what I assume are hex digits.
In a hex editor it turns out to be readable and sensible url's with
spaces between each digit, and a bit of crud at the end of url's, just
as above.
Any suggestions with that additional info?
I've used struct before, it is a very nice module. Could this be some
sort of UTF encoding?
I think I was a bit light on info with that first post.
Thanks for your time,
Matt
More information about the Tutor
mailing list