[Tutor] Regex help

Bill Burns billburns at pennswoods.net
Tue Oct 11 06:11:05 CEST 2005


> On Mon, 10 Oct 2005, Bill Burns wrote:
> 
> 
>>I'm looking to get the size (width, length) of a PDF file.
> 


> Hi Bill,
> 
> Just as a side note: you may want to look into using the 'pdfinfo' utility
> that comes as part of the xpdf package:
> 
>     http://www.foolabs.com/xpdf/
> 
> For example:
> 
> #######################################################################
> [dyoo at shoebox ~]$ pdfinfo 05-lexparse.pdf
> Producer:       Acrobat Distiller Command 3.0 for Solaris 2.3 and later
> (SPARC)
> CreationDate:   Tue Jul  1 18:36:35 1913
> Tagged:         no
> Pages:          12
> Encrypted:      no
> Page size:      612 x 792 pts (letter)
> File size:      191874 bytes
> Optimized:      no
> PDF version:    1.2
> #######################################################################
> 
> 
> 
> 
>>Every pdf file has a 'tag' (in the file) that looks similar to this
>>
>>Example #1
>>MediaBox [0 0 612 792]
>>
>>or this
>>
>>Example #2
>>MediaBox [ 0 0 612 792 ]
>>
>>I figured a regex might be a good way to get this data but the
>>whitespace (or no whitespace) after the left bracket has me stumped.
> 
> 
> 
> I think you might want to look for the whitespace metacharacter '\s'.
> Also, you can consider using '*' to qualify a previous pattern: it stands
> for "zero or more of the pattern."  For example:
> 
> #####################################
> 
>>>>re.search("a*b", "aab")
> 
> <_sre.SRE_Match object at 0x403ae250>
> 
>>>>re.search("a*b", "ab")
> 
> <_sre.SRE_Match object at 0x403ae138>
> 
>>>>re.search("a*b", "b")
> 
> <_sre.SRE_Match object at 0x403ae250>
> 
>>>>re.search("a*b", "")
>>>>
> 
> #####################################
> 
> In comparison:
> 
> 
> #####################################
> 
>>>>re.search("a+b", "aab")
> 
> <_sre.SRE_Match object at 0x403ae138>
> 
>>>>re.search("a+b", "ab")
> 
> <_sre.SRE_Match object at 0x403ae250>
> 
>>>>re.search("a+b", "b")
>>>>
> 
> #####################################
> 

Danny,

Thank you for the information.

Bill


More information about the Tutor mailing list