[Tutor] Regex help
Bill Burns
billburns at pennswoods.net
Tue Oct 11 06:11:05 CEST 2005
> On Mon, 10 Oct 2005, Bill Burns wrote:
>
>
>>I'm looking to get the size (width, length) of a PDF file.
>
> Hi Bill,
>
> Just as a side note: you may want to look into using the 'pdfinfo' utility
> that comes as part of the xpdf package:
>
> http://www.foolabs.com/xpdf/
>
> For example:
>
> #######################################################################
> [dyoo at shoebox ~]$ pdfinfo 05-lexparse.pdf
> Producer: Acrobat Distiller Command 3.0 for Solaris 2.3 and later
> (SPARC)
> CreationDate: Tue Jul 1 18:36:35 1913
> Tagged: no
> Pages: 12
> Encrypted: no
> Page size: 612 x 792 pts (letter)
> File size: 191874 bytes
> Optimized: no
> PDF version: 1.2
> #######################################################################
>
>
>
>
>>Every pdf file has a 'tag' (in the file) that looks similar to this
>>
>>Example #1
>>MediaBox [0 0 612 792]
>>
>>or this
>>
>>Example #2
>>MediaBox [ 0 0 612 792 ]
>>
>>I figured a regex might be a good way to get this data but the
>>whitespace (or no whitespace) after the left bracket has me stumped.
>
>
>
> I think you might want to look for the whitespace metacharacter '\s'.
> Also, you can consider using '*' to qualify a previous pattern: it stands
> for "zero or more of the pattern." For example:
>
> #####################################
>
>>>>re.search("a*b", "aab")
>
> <_sre.SRE_Match object at 0x403ae250>
>
>>>>re.search("a*b", "ab")
>
> <_sre.SRE_Match object at 0x403ae138>
>
>>>>re.search("a*b", "b")
>
> <_sre.SRE_Match object at 0x403ae250>
>
>>>>re.search("a*b", "")
>>>>
>
> #####################################
>
> In comparison:
>
>
> #####################################
>
>>>>re.search("a+b", "aab")
>
> <_sre.SRE_Match object at 0x403ae138>
>
>>>>re.search("a+b", "ab")
>
> <_sre.SRE_Match object at 0x403ae250>
>
>>>>re.search("a+b", "b")
>>>>
>
> #####################################
>
Danny,
Thank you for the information.
Bill
More information about the Tutor
mailing list