[Tutor] regex advice
Peter Otten
__peter__ at web.de
Tue Jan 6 15:17:53 CET 2015
Norman Khine wrote:
> i have a blade template file, as
>
> replace page
> .row
> .large-8.columns
> form( method="POST", action="/product/saveall/#{style._id}" )
> input( type="hidden" name="_csrf" value=csrf_token )
> h3 #{t("Generate Product for")} #{tt(style.name)}
> .row
> .large-6.columns
> h4=t("Available Attributes")
> - for(var i = 0; i < attributes.length; i++)
> - var attr = attributes[i]
> - console.log(attr)
> ul.attribute-block.no-bullet
> li
> b= tt(attr.name)
> - for(var j = 0; j < attr.values.length; j++)
> - var val = attr.values[j]
> li
> label
> input( type="checkbox" title="#{tt(attr.name)}:
> #{tt(val.name)}" name="#{attr.id}" value="#{val.id}")
> |
> =tt(val.name)
> = " [Code: " + (val.code || val._id) + "]"
> !=val.htmlSuffix()
> .large-6.columns
> h4 Generated Products
> ul#products
> button.button.small
> i.icon-save
> |=t("Save")
> =" "
> a.button.small.secondary( href="/product/list/#{style.id}" )
> i.icon-cancel
> |t=("Cancel")
>
> when i run the above code, i get
>
> - file add.blade (full path:
> ../node-blade-boiler-template/views/product/add.blade)
> type="hidden" name="_csrf" value=csrf_token
> "Generate product for")} #{tt(style.name
> "Available Attributes"
> attr.name
> type="checkbox" title="#{tt(attr.name)}: #{tt(val.name)}"
> name="#{attr.id}"
> value="#{val.id}"
> val.name
> "Generated products"
> "Save"
>
>
>
> so, gettext_re = re.compile(r"""[t]\((.*)\)""").findall is not correct as
> it includes
>
> results such as input( type="hidden" name="_csrf" value=csrf_token )
>
> what is the correct way to pull all values that are within t(" ") but
> exclude any tt( ) and input( )
>
> any advice much appreciated
You can require a word boundary before the 't'. Quoting
<https://docs.python.org/dev/library/re.html#regular-expression-syntax>:
"""
\b
Matches the empty string, but only at the beginning or end of a word. A word
is defined as a sequence of Unicode alphanumeric or underscore characters,
so the end of a word is indicated by whitespace or a non-alphanumeric, non-
underscore Unicode character. Note that formally, \b is defined as the
boundary between a \w and a \W character (or vice versa), or between \w and
the beginning/end of the string. This means that r'\bfoo\b' matches 'foo',
'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'.
By default Unicode alphanumerics are the ones used, but this can be changed
by using the ASCII flag. Inside a character range, \b represents the
backspace character, for compatibility with Python’s string literals.
"""
Also you are probably better off with a non-greedy match. So
>>> sample = 'yadda t("foo") [t("bar")] input("baz")'
>>> re.findall(r"t\((.*)\)", sample)
['"foo") [t("bar")] input("baz"']
>>> re.findall(r"t\((.*?)\)", sample)
['"foo"', '"bar"', '"baz"']
>>> re.findall(r"\bt\((.*?)\)", sample)
['"foo"', '"bar"']
More information about the Tutor
mailing list