Idea: Tagged strings in python
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
Hi everyone! I'm the maintainer of a small django library called django-components. I've run into a problem that I have a language-level solution (tagged strings) to, that I think would benefit the wider python community. *Problem* A component in my library is a combination of python code, html, css and javascript. Currently I glue things together with a python file, where you put the paths to the html, css and javascript. When run, it brings all of the files together into a component. But for small components, having to juggle four different files around is cumbersome, so I've started to look for a way to put everything related to the component _in the same file_. This makes it much easier to work on, understand, and with fewer places to make path errors. Example: class Calendar(component.Component): template_string = '<span class="calendar"></span>' css_string = '.calendar { background: pink }' js_string = 'document.getElementsByClassName("calendar)[0].onclick = function() { alert("click!") }' Seems simple enough, right? The problem is: There's no syntax highlighting in my code editor for the three other languages. This makes for a horrible developer experience, where you constantly have to hunt for characters inside of strings. You saw the missing quote in js_string right? :) If I instead use separate files, I get syntax highlighting and auto-completion for each file, because editors set language based on file type. But should I really have to choose? *Do we need a python language solution to this?* Could the code editors fix this? There's a long issue thread for vscode where this is discussed: https://github.com/Microsoft/vscode/issues/1751 - The reasoning (reasonable imho) is that this is not something that can be done generally, but that it needs to be handled at the python vscode extension level. Makes sense. Could the vscode language extension fix this? Well, the language extension has no way to know what language it should highlight. If a string is HTML or CSS. PyCharm has decided to use a "special python comment" # language=html that makes the next string be highlighted in that language. So if just all editors could standardize on that comment, everything would work? I guess so, but is that really the most intuitive API to standardize around? If the next statement is not a string, what happens? If the comment is on the same line as another statement, does it affect that line, or the next? What if there's a newline in between the comment in the string, does that work? *Suggested solution* I suggest supporting _tagged strings_ in python. They would look like html'<span class="calendar"></span>'. * Python should not hold a list of which tagged strings it should support, it should be possible to use any tag. * To avoid clashes with current raw strings and unicode strings, a tag should be required to be at least 2 characters long (I'm open to other ways to avoid this). I like this syntax because: 1. It's clear what string the tag is affecting. 2. It makes sense when you read it, even though you've never seen the syntax before. 3. It clearly communicates which language to highlight to code editors, since you can use the language identifiers that already exist: https://code.visualstudio.com/docs/languages/identifiers#_known-language-ide... - for single letter languages, which are not supported to avoid clash with raw strings and unicode strings, the language extension would have to support "r-lang" and "c-lang" instead. 4. It mimics the syntax of tagged string templates in javascript (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_l...). So it has som precedent. (If desirable, I think mimicing javascript further and making tagged strings call a function with the tag's name, would be a great addition to Python too. This would make the syntax for parsing foreign languages much nicer. But this is not required for my specific problem, it's just a nice next possible step for this feature.) *Backwards compatibility* This syntax currently raises a invalid syntax error. So introducing this shouldn't break existing programs. Python's currently supported string types are just single letter, so the suggestion is to require tagged strings to be at least two letters. *Feedback?* What are your thoughts on this? Do you see a value in adding tagged strings to python? Are there other use-cases where this would be useful? Does the suggestion need to support calling tags as functions like in javascript to be interesting? (I'm new to python-ideas, so I hope I haven't broken some unspoken rule with this suggestion.) -- Emil Stenström
data:image/s3,"s3://crabby-images/8c8cc/8c8ccb69b07acfd42f699246c4a44e6942e9d33a" alt=""
I think this has been discussed before and rejected. Your need 2 things to happen (1) a syntax change in python that is acceptable (2) a significant editor to support syntax highlighting for that python change. (3) someone willing to write and support the feature in the python code base Will you write and support the code? If the tags are called as functions then you can do it today with this: def html(s): return s HEAD = html('<head>') Barry
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
My impression whenever this idea is proposed is like Barry's. The "win" isn't big enough not simply to use named functions. Balancing out the slight "win" is the much larger loss of adding additional complexity to the Python language. New grammar, new parser, possibly new semantics if tagged strings are more than exclusively decorative. It's not a *huge* complexity, but it's more than zero, and these keep adding up. Python is SO MUCH less simple than it was when I learned it in 1998. While each individual change might have its independent value, it is now hard to describe Python as a "simple language." Moreover, there is no reason an editor could not have a capability to "colorize any string passed to a function named foo()." Perhaps with some sort of configuration file that indicates which function names correspond to which languages, but also with presets. The details could be worked out, and maybe even an informal lexicon could be developed in a shared way. But all we save with more syntax is two character. And the function style is exactly what JavaScript tagged strings do anyway, just as a shorthand for "call a function". Compare: header = html`<h1>Hello</h1>` header = html("<h1>Hello</h1>") If we imagine that your favorite editor does the same colorization inside the wrapped string either way, how are these really different? On Sat, Dec 17, 2022 at 12:01 PM Barry Scott <barry@barrys-emacs.org> wrote: > > > > On 17 Dec 2022, at 16:07, emil@emilstenstrom.se wrote: > > > > Hi everyone! > > > > I'm the maintainer of a small django library called django-components. > I've run into a problem that I have a language-level solution (tagged > strings) to, that I think would benefit the wider python community. > > > > *Problem* > > A component in my library is a combination of python code, html, css and > javascript. Currently I glue things together with a python file, where you > put the paths to the html, css and javascript. When run, it brings all of > the files together into a component. But for small components, having to > juggle four different files around is cumbersome, so I've started to look > for a way to put everything related to the component _in the same file_. > This makes it much easier to work on, understand, and with fewer places to > make path errors. > > > > Example: > > class Calendar(component.Component): > > template_string = '<span class="calendar"></span>' > > css_string = '.calendar { background: pink }' > > js_string = 'document.getElementsByClassName("calendar)[0].onclick = > function() { alert("click!") }' > > > > Seems simple enough, right? The problem is: There's no syntax > highlighting in my code editor for the three other languages. This makes > for a horrible developer experience, where you constantly have to hunt for > characters inside of strings. You saw the missing quote in js_string right? > :) > > > > If I instead use separate files, I get syntax highlighting and > auto-completion for each file, because editors set language based on file > type. But should I really have to choose? > > > > *Do we need a python language solution to this?* > > Could the code editors fix this? There's a long issue thread for vscode > where this is discussed: https://github.com/Microsoft/vscode/issues/1751 > - The reasoning (reasonable imho) is that this is not something that can be > done generally, but that it needs to be handled at the python vscode > extension level. Makes sense. > > > > Could the vscode language extension fix this? Well, the language > extension has no way to know what language it should highlight. If a string > is HTML or CSS. PyCharm has decided to use a "special python comment" # > language=html that makes the next string be highlighted in that language. > > > > So if just all editors could standardize on that comment, everything > would work? I guess so, but is that really the most intuitive API to > standardize around? If the next statement is not a string, what happens? If > the comment is on the same line as another statement, does it affect that > line, or the next? What if there's a newline in between the comment in the > string, does that work? > > > > *Suggested solution* > > I suggest supporting _tagged strings_ in python. They would look like > html'<span class="calendar"></span>'. > > * Python should not hold a list of which tagged strings it should > support, it should be possible to use any tag. > > * To avoid clashes with current raw strings and unicode strings, a tag > should be required to be at least 2 characters long (I'm open to other ways > to avoid this). > > > > I like this syntax because: > > 1. It's clear what string the tag is affecting. > > 2. It makes sense when you read it, even though you've never seen the > syntax before. > > 3. It clearly communicates which language to highlight to code editors, > since you can use the language identifiers that already exist: > https://code.visualstudio.com/docs/languages/identifiers#_known-language-identifiers > - for single letter languages, which are not supported to avoid clash with > raw strings and unicode strings, the language extension would have to > support "r-lang" and "c-lang" instead. > > 4. It mimics the syntax of tagged string templates in javascript ( > https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates). > So it has som precedent. > > > > (If desirable, I think mimicing javascript further and making tagged > strings call a function with the tag's name, would be a great addition to > Python too. This would make the syntax for parsing foreign languages much > nicer. But this is not required for my specific problem, it's just a nice > next possible step for this feature.) > > > > *Backwards compatibility* > > This syntax currently raises a invalid syntax error. So introducing this > shouldn't break existing programs. Python's currently supported string > types are just single letter, so the suggestion is to require tagged > strings to be at least two letters. > > > > *Feedback?* > > What are your thoughts on this? Do you see a value in adding tagged > strings to python? Are there other use-cases where this would be useful? > Does the suggestion need to support calling tags as functions like in > javascript to be interesting? > > > > (I'm new to python-ideas, so I hope I haven't broken some unspoken rule > with this suggestion.) > > I think this has been discussed before and rejected. > > Your need 2 things to happen > (1) a syntax change in python that is acceptable > (2) a significant editor to support syntax highlighting for that python > change. > (3) someone willing to write and support the feature in the python code > base > > Will you write and support the code? > > If the tags are called as functions then you can do it today with this: > > def html(s): > return s > > HEAD = html('<head>') > > > Barry > > > > > -- > > Emil Stenström > > _______________________________________________ > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-leave@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/OXHQHMV2JC2PY7K63VNIMSTP5T46LPKT/ > > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-leave@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/E27OL43KVTWNH7CDJ7Q7AAHF5UACMWEL/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
David Mertz, Ph.D. wrote:
My impression whenever this idea is proposed is like Barry's. The "win" isn't big enough not simply to use named functions.
Named functions solve another problem, so I don't see how this is an alternative? More on this below.
This is an argument against _any_ change to the language. I recognize this sentiment, but stopping all change in the hopes of python being simple again I don't agree with. I don't think the general python developer is there either.
This is an interesting idea. Some counter-arguments: * Anything that's hidden behind a config file won't be used except by very few. So, as you say, you need presets somehow. * Using presents for something simple like html() would render a lot of existing code differently than before this change. I don't think this i acceptable. * The idea that "when a function named X is called, the parameter should be highlighted with language X" seems complicated to implement in a code editor. * Will it apply for all arguments, just the first one, or all strings? Due to the above I think it makes more sense to tag _the string_, not the calling function.
The point here is not saving characters typed, it's tagging a string so it's easy for an editor to highlight it. For the reasons I listed above the two versions above are not equivalent.
If we imagine that your favorite editor does the same colorization inside the wrapped string either way, how are these really different?
If there was a chance this could happen, it would solve my problem nicely. For the reasons above, I don't think this will be acceptable to editors.
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
On Sat, Dec 17, 2022, 1:03 PM <emil@emilstenstrom.se> wrote:
I've been using vim long enough that I probably only edit .vimrc (or correspondingly for neovim) every week or two. I use VS Code much less, so when I do, I probably edit setting.json more like once a day (when I'm using it) But many editors in any cases, have friendly custom editors for some elements of their configs. Of course, if presets are fine, indeed users need not change them. Tagged templates do EXACTLY ZERO to make this less of a concern. If there was a chance this could happen, it would solve my problem nicely.
For the reasons above, I don't think this will be acceptable to editors.
I could trivially implement this in a few lines within every modern editor I am aware of. I bet you can do it for your editor with less than 2 hours effort.
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
Hi Barry, Your reply could easily be read as "this is a bad idea, and you shouldn't have bothered writing it down". I hope that was not your intention, and instead it comes from handling self-indulgent people expecting things from you all day. I know, I get those requests too. I'll assume that was not your intention in my answers below. Barry Scott wrote:
I think this has been discussed before and rejected.
Do you have a link to that discussion, or is this just from memory? What should I search for to find this discussion? Why was it rejected?
I understand all these 3 things are needed. I'm saying that I think this feature is worth it. Do you mean I should do things in a separate order? We are in the idea stage, before a (1) strict syntax can be suggested.
Will you write and support the code?
Is commiting to write the code a requirement to suggest an idea? Or course this is required down the line, but let's see if this is a good idea first?
If I'm not missing anything, this doesn't help with syntax highlighting? Highlighting is the problem I'm talking about in my post above. Regards, Emil
data:image/s3,"s3://crabby-images/4d484/4d484377daa18e9172106d4beee4707c95dab2b3" alt=""
On Sat, Dec 17, 2022 at 9:43 AM <emil@emilstenstrom.se> wrote:
Barry Scott wrote:
Try googling "python-ideas string prefixes". Doing mimimal diligence is a reasonable expectation before writing up an idea.
Not true. A syntax highlighter can certainly recognize html('...') just as it can recognize html'...'. --- Bruce
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
Bruce Leban wrote:
Try googling "python-ideas string prefixes". Doing mimimal diligence is a reasonable expectation before writing up an idea.
Thanks for the query "string prefixes". I tried other queries but not that one. I ended my first message with "I hope I didn't break any unspoken rules" and it seems I have.
I replied to this in a separate post, but html() is likely a function name that is used in millions of existing code bases. Applying this rule to all of them will lead to too many errors to be acceptable to editors I think. And if this has to be explicitly configured in an editor very few will use it.
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
For reference: This thread has a much deeper discussion of this idea: https://discuss.python.org/t/allow-for-arbitrary-string-prefix-of-strings/19... I'll continue the discussion there instead.
data:image/s3,"s3://crabby-images/4d484/4d484377daa18e9172106d4beee4707c95dab2b3" alt=""
On Sat, Dec 17, 2022 at 10:10 AM <emil@emilstenstrom.se> wrote:
Understood. This string suffix syntax is supported by Python today and syntax highlighters could be modified to support this without requiring changes to any other component. class Calendar(component.Component): template_string = '<span class="calendar"></span>' ##html css_string = '.calendar { background: pink }' ##css js_string = 'document.getElementsByClassName("calendar")[0].onclick = function() { alert("click!") }' ##javascript --- Bruce
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
On Sat, Dec 17, 2022, at 19:20, Bruce Leban wrote:
PyCharm supports syntax similar to this. They put a # language=html on the line in front of the string. I think this is messy for the reasons in my original post, but maybe this is the only reasonable way forward. I'll see if I can ask the vscode python language extension team what they think. Nice to see you fixed the syntax error in the js too! :)
data:image/s3,"s3://crabby-images/0a0b8/0a0b89b5ba07f1189b0dda490d64cc1178193761" alt=""
My two cents (speaking as long-term observer, not as the moderator, or perhaps in addition to the moderator ;) - I think your ask was appropriate, and I think the response of “here’s the search you should do!” was great. Personally I think we could do without the implication that you should have done more due diligence. python-ideas is PRECISELY for this kind of question. Other forums should have a higher barrier to entry (like python-dev), but not python-ideas. best, —titus
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
Jim Baker has been working on tagged strings, and Guido has a working implementation. See https://github.com/jimbaker/tagstr/issues/1 I thought Jim had a draft PEP on this somewhere, but I can’t find it. -- Eric
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
Just to be clear on my opinion. I think Emil's idea was 100% appropriate to share on python-ideas, and he does a good job of showing where it works be useful. Sure, a background search is nice, but not required. That doesn't mean I *support* the idea. I take a very conservative attitude towards language changes. I hope I've provided okay explanation of my non-support, but it's NOT a criticism of Emil in any way. That said, Jim Baker pitched his similar idea to my at last PyCon, and I remember coming closer to feeling supportive. Maybe partially just because I know and like Jim for a long time. But I think he was also suggesting some extra semantics that seemed to move the needle in my mind. On Sat, Dec 17, 2022, 2:27 PM Eric V. Smith via Python-ideas < python-ideas@python.org> wrote:
data:image/s3,"s3://crabby-images/df4b3/df4b368ca47542a64f37c0d860aca7fa0d40e95b" alt=""
On 18/12/2022 05.07, emil@emilstenstrom.se wrote:
Seems simple enough, right? The problem is: There's no syntax highlighting in my code editor for the three other languages. This makes for a horrible developer experience, where you constantly have to hunt for characters inside of strings. You saw the missing quote in js_string right? :)
Is this a problem with Python, or with the tool? « Language injections Last modified: 14 December 2022 Language injections let you work with pieces of code in other languages embedded in your code. When you inject a language (such as HTML, CSS, XML, RegExp, and so on) into a string literal, you get comprehensive code assistance for editing that literal. ... » https://www.jetbrains.com/help/pycharm/using-language-injections.html Contains a specific example for Django scripters. (sadly as an image - probably wouldn't be handled by this ListServer)
If I instead use separate files, I get syntax highlighting and auto-completion for each file, because editors set language based on file type. But should I really have to choose?
In other situations where files need to be collected together, a data-archive may be used (not to be confused with any historical context, nor indeed with data-compression). Might a wrapper around such of PSL's services help to both keep everything together, and yet enable separate editing format-recognition? « Data Compression and Archiving The modules described in this chapter support data compression with the zlib, gzip, bzip2 and lzma algorithms, and the creation of ZIP- and tar-format archives. ... » https://docs.python.org/3/library/archiving.html Disclaimer: JetBrains sponsors our PUG with monthly prizes, eg PyCharm. -- Regards, =dn
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
dn wrote:
I touched upon this solution in the original post. If all editors could agree to use # language=html it would be an ok solution. That API creates lots of ambiguity around to what the comment should be applied. Some examples which are non-obvious imho: ------------ "<div>" # language=html "<span> ------------ # language=html "<div>" ------------ # language=html process_html("<html>") ------------ # language=html concat_html("<html>", "<span>") ------------
The point here is to have everything in one file, editable and syntax highlighted in that same file. I don't think this tip applies to that?
data:image/s3,"s3://crabby-images/437f2/437f272b4431eff84163c664f9cf0d7ba63c3b32" alt=""
emil@emilstenstrom.se writes:
Seems simple enough, right? The problem is: There's no syntax highlighting in my code editor for the three other languages.
Then you're not using Emacs's mmm-mode, which has been available for a couple of decades. Now, mmm-mode doesn't solve the whole problem -- it doesn't know anything about how the languages are tagged. But this isn't a problem for an Emacs shop, the team decides on a convention (or recognizes a third party's convention), and somebody will code up the 5-line function that font-lock (syntax highlighter in Emacs) uses to dispatch to the appropriate the syntax highlighting mode. AFAICS this requires either all editors become Emacs ;-) or all editor maintainers get together and agree on the tags (this will need to be extensible, there are a lot of languages out there, and some editors will want to distinguish languages by version to flag syntax invalid in older versions). Is this really going to happen? Just for Python? When the traditional solution of separating different languages into different files is almost always acceptable? There are other uses proposed for tagged strings. In combination, perhaps this feature is worthwhile. But I think that on its own the multiple language highlighting application is pretty dubious given the limited benefit vs. the amount of complexity it will introduce not only in Python, but in editors as well.
This makes for a horrible developer experience, where you constantly have to hunt for characters inside of strings.
If this were a feature anyway, it would be very useful in certain situations (for example dynamic web pages), no question about it. But mixed-language files are not something I want to see in projects I work on -- and remember, I use Emacs, I have mmm-mode already.
This is problematic for your case. This means that the editor needs to change how it dispatches to syntax highlighting. Emacs, no problem, it already dispatches highlighting based on tagged regions of text. But are other editors going to *change* to do that?
But should I really have to choose?
Most of the time, I'd say "yes", and you should choose multiple files. ;-) YMMV of course, but I really appreciate the separation of concerns that is provided by separate files for Python code, HTML templates, and (S)CSS presentation.
Makes sense, yes -- that's how Emacs does it, but Emacs is *already* fundamentally designed on a model of implicitly tagged text. Parsing strings is already relatively hard because the begin marker is the same as the end marker. Now you need to tie it to the syntax highlighting mode, which may change over large regions of text every time you insert or delete a quotation mark or comment delimiter. You *can't* just hand it off to the Python highlighter, *every* syntax highlighter that might be used inside a Python string at least needs to know how to hand control back to Python. For one thing, they all need to learn about all four of Python's string delimiters. And it gets worse. I wonder how you end up with CSS and HTML inside Python strings? Yup, the CSS is inside a <style> element inside the HTML inside the Python string which may end in any of four different ways. It's not good enough to add this to the Python highlighter.... Even if Python gets tagged strings, my bet is that the odds are quite bad that any given editor ever supports this application of them. I wouldn't wish this on the devs of any editor except Emacs, which has had it since the late 1990s. Isn't it easier for you to just use Emacs? ;-) Steve
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
Well, obviously I have to come to the defense of vim as well :-). I'm not sure what year vim got the capability, but I suspect around as long as emacs. This isn't for exactly the same language use case, but finding a quick example on the internet: unlet b:current_syntaxsyntax include @srcBash syntax/bash.vim syntax region srcBashHi start="..." end="..." keepend contains=@srcBash unlet b:current_syntaxsyntax include @srcHTML syntax/html.vim syntax region srcHTMLHi start="^...$" end="^...$" keepend contains=@srcHTML This is easy to adapt to either the named function convention: `html('<h1>Hello</h1>')` or to the standardized-comment convention. In general, I find any proposal to change Python "because then my text editor would need to change to accommodate the language" to be unconvincing.
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Personally, I’m skeptical of any proposal to change Python to make it easier for IDEs. But there *may* be other good reasons to do something like this. I’m not a static typing guy, but it segg do me that it could be useful to subtype strings: This function expects an SQL string. This function returns an SQL string. Maybe not worth the overhead, but worth more than giving IDEs hints SATO what to do. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
Using a typing approach sounds like a fantastic idea. Moreover, as Stephen showed, it's easy to make Emacs utilize that, and as I showed, it's easy to make vim follow that. I've only written one tiny VS Code extension, but it wouldn't be hard there either. I'm not sure how one adds stuff to PyCharm and other editors, but I have to believe it's possible. So I see two obvious approaches, both of which 100% fulfill Emil's hope without new syntax: #1 from typing import NewType html = NewType("html", str) css = NewType("css", str) a: html = html("<h1>Hello world</h1>") b: css = css("h1 { color: #999999; }") def combine(h: html, c: css): print(f"Combined page elements: {h} | {c}") combine(a, b) # <- good combine(b, a) # <- bad However, if you want to allow these types to possibly *do* something with the strings inside (validate them, canonicalize them, do a security check, etc), I think I like the other way: #2 class html(str): pass class css(str): pass a: html = html("<h1>Hello world</h1>") b: css = css("h1 { color: #999999; }") def combine(h: html, c: css): print(f"Combined page elements: {h} | {c}") combine(a, b) combine(b, a) The type annotations in the assignment lines are optional, but if you're doing something other than just creating an instance of the (pseudo-)type, they might add something. They might also be what your text editor decides to use as its marker. For either version, type analysis will find a problem. If I hadn't matched the types in the assignment, it would detect extra problems: (py3.11) 1310-scratch % mypy tagged_types1.py tagged_types1.py:13: error: Argument 1 to "combine" has incompatible type "css"; expected "html" [arg-type] tagged_types1.py:13: error: Argument 2 to "combine" has incompatible type "html"; expected "css" [arg-type] Found 2 errors in 1 file (checked 1 source file) Using typing.Annotated can also be used, but it solves a slightly different problem. On Sun, Dec 18, 2022 at 5:24 PM Paul Moore <p.f.moore@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Dec 18, 2022 at 07:38:06PM -0500, David Mertz, Ph.D. wrote:
The problem with this is that the builtins are positively hostile to subclassing. The issue is demonstrated with this toy example: class mystr(str): def method(self): return 1234 s = mystr("hello") print(s.method()) # This is fine. print(s.upper().method()) # This is not. To be useable, we have to override every string method that returns a string. Including dunders. So your class becomes full of tedious boiler plate: def upper(self): return type(self)(super().upper()) def lower(self): return type(self)(super().lower()) def casefold(self): return type(self)(super().casefold()) # Plus another 29 or so methods This is not just tedious and error-prone, but it is inefficient: calling super returns a regular string, which then has to be copied as a subclassed string and the original garbage collected. -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, 19 Dec 2022 at 12:29, Steven D'Aprano <steve@pearwood.info> wrote:
"Hostile"? I dispute that. Are you saying that every method on a string has to return something of the same type as self, rather than a vanilla string? Because that would be far MORE hostile to other types of string subclass:
Demo.x is a string. Which means that, unless there's good reason to do otherwise, it should behave as a string. So it should be possible to use it as if it were the string "eggs", including appending it to something, appending something to it, uppercasing it, etc, etc, etc. So what should happen if you do these kinds of manipulations? Should attempting to use a string in a normal string context raise ValueError?
I would say that *that* would count as "positively hostile to subclassing". ChrisA
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
On Sun, Dec 18, 2022 at 8:29 PM Steven D'Aprano <steve@pearwood.info> wrote:
I'd agree to "limited", but not "hostile." Look at the suggestions I mentioned: validate, canoncialize, security check. All of those are perfectly fine in `.__new__()`. E.g.: In [1]: class html(str): ...: def __new__(cls, s): ...: if not "<" in s: ...: raise ValueError("That doesn't look like HTML") ...: return str.__new__(cls, s) In [2]: html("<h1>Hello</h1>") In [3]: html("Hello") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-3-71d16160c9ad> in <module> ----> 1 html("Hello") <ipython-input-1-e9d5da1202f3> in __new__(cls, s) 2 def __new__(cls, s): 3 if not "<" in s: ----> 4 raise ValueError("That doesn't look like HTML") 5 ValueError: That doesn't look like HTML I readily acknowledge that's not a very thorough validator :-). But this much (say with a better validator) gets you static type checking, syntax highlighting, and inherent documentation of intent. I know that lots of things one can do with a str subclass wind up producing a str instead. But if the thing you do is just "make sure it is created as the right kind of thing for static checking and editor assistance, I don't care about any of that falling back. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Dec 18, 2022 at 10:23:18PM -0500, David Mertz, Ph.D. wrote:
No, they aren't perfectly fine, because as soon as you apply any operation to your string subclass, you get back a plain vanilla string which bypasses your custom `__new__` and so does not perform the validation or security check.
But this much (say with a better validator) gets you static type checking, syntax highlighting, and inherent documentation of intent.
Any half-way decent static type-checker will immediately fail as soon as you call a method on this html string, because it will know that the method returns a vanilla string, not a html string. And that's exactly what mypy does: [steve ~]$ cat static_check_test.py class html(str): pass def func(s:html) -> None: pass func(html('').lower()) [steve ~]$ mypy static_check_test.py static_check_test.py:7: error: Argument 1 to "func" has incompatible type "str"; expected "html" Found 1 error in 1 file (checked 1 source file) Same with auto-completion. Either auto-complete will correctly show you that what you thought was a html object isn't, and fail to show any additional methods you added; or worse, it will wrongly think it is a html object when it isn't, and allow you to autocorrect methods that don't exist.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano <steve@pearwood.info> wrote:
But what does it even mean to uppercase an HTML string? Unless you define that operation specifically, the most logical meaning is "convert it into a plain string, and uppercase that". Or, similarly, slicing an HTML string. You could give that a completely different meaning (maybe defining its children to be tags, and slicing is taking a selection of those), but if you don't, slicing isn't really a meaningful operation. So it should be correct: you cannot simply uppercase an HTML string and expect sane HTML. I might be more sympathetic if you were talking about "tainted" strings (ie those which contain data from an end user), on the basis that most operations on those should yield tainted strings, but given that systems of taint tracking seem to have managed just fine with the existing way of doing things, still not particularly persuasive. ChrisA
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 19Dec2022 22:45, Chris Angelico <rosuav@gmail.com> wrote:
Yes, this was my thought. I've got a few subclasses of builtin types. They are not painless. For HTML "uppercase" is a kind of ok notion because the tags are case insensitive. Notthe case with, say, XML - my personal nagging example is from KML (Google map markup dialect) where IIRC a "ScreenOverlay" and a "screenoverlay" both existing with different semantics. Ugh. So indeed, I'd probably _want_ .upper to return a plain string and have special methods to do more targetted things as appropriate. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, 21 Dec 2022 at 09:30, Cameron Simpson <cs@cskk.id.au> wrote:
Tag names are, but their attributes might not be, so even that might not be safe.
Ugh indeed. Why? Why? Why?
So indeed, I'd probably _want_ .upper to return a plain string and have special methods to do more targetted things as appropriate.
Agreed. ChrisA
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
As has been said, a builtin *could* be written that would be "friendly to subclassing", by the definition in this thread. (I'll stay out of the argument for the moment as to whether that would be better) I suspect that the reason str acts like it does is that it was originally written a LONG time ago, when you couldn't subclass basic built in types at all. Secondarily, it could be a performance tweak -- minimal memory and peak performance are pretty critical for strings. But collections.UserString does exist -- so if you want to subclass, and performance isn't critical, then use that. Steven A pointed out that UserStrings are not instances of str though. I think THAT is a bug. And it's probably that way because with the magic of duck typing, no one cared -- but with all the static type hinting going on now, that is a bigger liability than it used to be. Also basue when it was written, you couldn't subclass str. Though I will note that run-time type checking of string is relatively common compared to other types, due to the whole a-str-is-a-sequence-of-str issue making the distinction between a sequence of strings and a string itself is sometimes needed. And str is rarely duck typed. If anyone actually has a real need for this I'd post an issue -- it'd be interesting if the core devs see this as a bug or a feature (well, probably not feature, but maybe missing feature) OK -- I got distracted and tried it out -- it was pretty easy to update UserString to be a subclass of str. I suspect it isn't done that way now because it was originally written because you could not subclass str -- so it stored an internal str instead. The really hacky part of my prototype is this: # self.data is the original attribute for storing the string internally. Partly to prevent my having to re-write all the other methods, and partly because you get recursion if you try to use the methods on self when overriding them ... @property def data(self): return "".join(self) The "".join is because it was the only way I quickly thought of to make a native string without invoking the __str__ method and other initialization machinery. I wonder if there is another way? Certainly there is in C, but in pure Python? Anyway, after I did that and wrote a __new__ -- the rest of it "just worked". def __new__(cls, s): return super().__new__(cls, s) UserString and its subclasses return instances of themselves, and instances are instances of str. Code with a couple asserts in the __main__ block enclosed. Enjoy! -CHB NOTE: VERY minimally tested :-) On Tue, Dec 20, 2022 at 4:17 PM Chris Angelico <rosuav@gmail.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/2f884/2f884aef3ade483ef3f4b83e3a648e8cbd09bb76" alt=""
On Tue, Dec 20, 2022 at 5:38 PM Christopher Barker <pythonchb@gmail.com> wrote:
Note that UserString does break some built-in functionality, like you can't apply regular expressions to a UserString:
There is more discussion in this thread ( https://stackoverflow.com/questions/59756050/python3-when-userstring-does-no...), including a link to a very old bug (https://bugs.python.org/issue232493). There is a related issue with json.dump etc, though it can be worked around since there is a python-only json implementation. I have run into this in practice at a previous job, with a runtime "taint" tracker for logging access to certain database fields in a Django application. Many views would select all fields from a table, then not actually use the fields I needed to log access to, which generated false positives. (Obviously the "correct" design is to only select data that is relevant for the given code, but I was instrumenting a legacy codebase with updated compliance requirements.) So I think there is some legitimate use for this, though object proxies can be made to work around most of the issues. - Lucas
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Tue, Dec 20, 2022 at 6:20 PM Lucas Wiman <lucas.wiman@gmail.com> wrote:
I wonder how many of these issues would go away if userString subclassed for str. Maybe some? But at the C level, duck typing simply doesn't work -- you need access to an actual C string struct. Code that worked with strings *could* have a little bit of wrapper for subclasses that would dig into it to find the actual str underneath -- but if that code had to be written everywhere strings are used in C -- that could be a pretty big project -- probably what Guido meant by: "Fixing this will be a major project, probably for Python 3000k" I don't suppose it has been addressed at all? Note: at least for string paths, the builtins all use fspath() (or something) so that should be easy to make work. (and seems to with my prototype already) There is a related issue with json.dump etc, json.dump works with my prototype as well. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Tue, Dec 20, 2022 at 8:21 PM Stephen J. Turnbull <stephenjturnbull
UserStrings are not instances of str though. I think THAT is a bug.
I guess, although surely the authors of that class thought about it.
Well, kind of — the entire reason for UserString was that at the time, str itself could not be subclassed. So it was certainly a feature at the time ;-) The question is whether anyone thought about it again later, and the docs seem to indicate not: UserString <https://docs.python.org/3/library/collections.html?highlight=userstring#coll...> objects The class, UserString <https://docs.python.org/3/library/collections.html?highlight=userstring#coll...> acts as a wrapper around string objects. The need for this class has been partially supplanted by the ability to subclass directly from str <https://docs.python.org/3/library/stdtypes.html#str>; however, this class can be easier to work with because the underlying string is accessible as an attribute. And it has no docstrings at all -- it doesn't strike me that anyone is putting any thought into carefully maintaining it. Anyway, this could probably be improved with a StringLike ABC I'm not so sure -- in many cases, the underlying C implementation is critical -- and strings are one of those things that generally aren't duck-typed -- subclassing is a special case of that. Anyway -- I've only gotten this far 'cause it caught my interest -- but I have no need for subclassing strings -- but if someone does, I think it would be worth at least bringing up with the core devs. -CHB
data:image/s3,"s3://crabby-images/21dda/21dda586b6b15305a5f5404123c2ec1fe76ef4a1" alt=""
That's interesting, for me both 3.9 and 3.10 show the f-string more than 5x faster. This is just timeit on f'{myvar}' vs ''.join((myvar,)) so it may not be the most nuanced comparison for a class property. Probably unsurprisingly having myvar be precomputed as the single tuple also gives speedups, around 45% for me. So if just speed is wanted maybe inject the tuple pre-constructed. ~ Jeremiah On Wed, Dec 21, 2022 at 1:19 AM Steven D'Aprano <steve@pearwood.info> wrote:
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Wed, Dec 21, 2022 at 8:34 AM Jeremiah Paige <ucodery@gmail.com> wrote:
That may be the optimization that 3.11 is doing for you :-) Now that I think about it, if this is immutable, which it should be, as it's a str subclass, then perhaps the data string can be pre-computed, as it was in the original. I liked the property, as philosophically, you don't want to store the same data twice, but with an immutable, there should be no danger of it getting out of sync, and it would be faster. (though memory intensive for large strings). -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Wed, Dec 21, 2022 at 1:18 AM Steven D'Aprano <steve@pearwood.info> wrote:
I think both of those will call self.__str__, which creates a recursion -- that's what I'm trying to avoid. I'm sure there are ways to optimize this -- but only worth doing if it's worth doing at all :-) - CHB That's about 14% faster than the f-string version.
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Thu, 22 Dec 2022 at 03:41, Christopher Barker <pythonchb@gmail.com> wrote:
Second one doesn't seem to.
Interestingly, neither does the f-string, *if* you include a format code with lots of room. I guess str.__format__ doesn't always call __str__().
Curiouser and curiouser. Especially since the returned strings aren't enclosed in quotes. Let's try something.
Huh. How about that. ChrisA
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Wed, Dec 21, 2022 at 8:54 AM Chris Angelico <rosuav@gmail.com> wrote:
hmm -- interesting trick -- I had jumped to that conclusion -- I wonder what it IS using under the hood? Interestingly, neither does the f-string, *if* you include a format
code with lots of room. I guess str.__format__ doesn't always call __str__().
Now that you mention that, UserString should perhaps have a __format__, More evidence that it's not really being maintained. Though maybe not -- perhaps the inherited one will be fine. Now that I think about it, perhaps the inherited __str__ would be fine as well. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Thu, 22 Dec 2022 at 04:14, Christopher Barker <pythonchb@gmail.com> wrote:
From the look of things, PyUnicode_Join (the internal function that handles str.join()) uses a lot of "reaching into the data structure" operations for efficiency. It uses PyUnicode_Check (aka "isinstance(x, str)") rather than PyUnicode_CheckExact (aka "type(x) is str") and then proceeds to cast the pointer and directly inspect its members. As such, I don't think UserString can ever truly be a str, and it'll never work with str.join(). The best you'd ever get would be explicitly mapping str over everything first:
And we don't want that to be the default, since we're not writing JavaScript code here. ChrisA
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Wed, Dec 21, 2022 at 9:35 AM Chris Angelico <rosuav@gmail.com> wrote:
I had figured subclasses of str wouldn’t be full players in the C code — but join() us pretty fundamental:-( -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
I am not enthusiastic about this idea at all: as I perceive it it is an IDE problem, external to the language, and should be resolved there - maybe with a recommendation PEP. But on the other hand, I had seem tens of e-mails discussing string-subclassing, so that annotations could suffice as a hint to inner-string highlighting - and then: subclassing is not really needed at all: Maybe we can allow string tagging in annotations by using `str['html']`, "str['css']" and so on. (the typing module even could take no-op names such as "html", "css", etc... to mean those without any other signs, so stuff could be annotated like `template: html = "xxxx"` which the the same typing machinery that makes things like `TypedDict`. `Required`, etc... work would present these as plain "str" to the runtime, while allowing any tooling to perceive it as a specialized class. In other words, one could then either write: mytemplate: str['html'] = "<html> ....</html>" Or from typing import html mytemplate: html = ... (the former way could be used for arbitrary tagging as proposed by the O.P. , and it would be trivial to add a "register" function to declaratively create new tags at static-analysis time. This syntax has the benefits that static type checkers can take full-beneffit of the string subtypes, correctly pointing out when a "CSS" string is passed as an argument that should contain "HTML", with no drawbacks, no syntax changes, and no backwards compatibility breaks. On Thu, Dec 22, 2022 at 1:42 AM Christopher Barker <pythonchb@gmail.com> wrote:
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Dec 19, 2022 at 01:02:02AM -0600, Shantanu Jain wrote:
collections.UserString can take away a lot of this boilerplate pain from user defined str subclasses.
At what performance cost? Also:
which pretty much makes UserString useless for any code that does static checking or runtime isisinstance checks. In any case, I was making a larger point that this same issue applies to other builtins like float, int and more. -- Steve
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
On 17/12/2022 16:07, emil@emilstenstrom.se wrote:
Python's currently supported string types are just single letter, so the suggestion is to require tagged strings to be at least two letters.
Er, no: Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
rf'{2+2}' '4'
Best wishes Rob Cliffe
data:image/s3,"s3://crabby-images/8c8cc/8c8ccb69b07acfd42f699246c4a44e6942e9d33a" alt=""
I think this has been discussed before and rejected. Your need 2 things to happen (1) a syntax change in python that is acceptable (2) a significant editor to support syntax highlighting for that python change. (3) someone willing to write and support the feature in the python code base Will you write and support the code? If the tags are called as functions then you can do it today with this: def html(s): return s HEAD = html('<head>') Barry
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
My impression whenever this idea is proposed is like Barry's. The "win" isn't big enough not simply to use named functions. Balancing out the slight "win" is the much larger loss of adding additional complexity to the Python language. New grammar, new parser, possibly new semantics if tagged strings are more than exclusively decorative. It's not a *huge* complexity, but it's more than zero, and these keep adding up. Python is SO MUCH less simple than it was when I learned it in 1998. While each individual change might have its independent value, it is now hard to describe Python as a "simple language." Moreover, there is no reason an editor could not have a capability to "colorize any string passed to a function named foo()." Perhaps with some sort of configuration file that indicates which function names correspond to which languages, but also with presets. The details could be worked out, and maybe even an informal lexicon could be developed in a shared way. But all we save with more syntax is two character. And the function style is exactly what JavaScript tagged strings do anyway, just as a shorthand for "call a function". Compare: header = html`<h1>Hello</h1>` header = html("<h1>Hello</h1>") If we imagine that your favorite editor does the same colorization inside the wrapped string either way, how are these really different? On Sat, Dec 17, 2022 at 12:01 PM Barry Scott <barry@barrys-emacs.org> wrote: > > > > On 17 Dec 2022, at 16:07, emil@emilstenstrom.se wrote: > > > > Hi everyone! > > > > I'm the maintainer of a small django library called django-components. > I've run into a problem that I have a language-level solution (tagged > strings) to, that I think would benefit the wider python community. > > > > *Problem* > > A component in my library is a combination of python code, html, css and > javascript. Currently I glue things together with a python file, where you > put the paths to the html, css and javascript. When run, it brings all of > the files together into a component. But for small components, having to > juggle four different files around is cumbersome, so I've started to look > for a way to put everything related to the component _in the same file_. > This makes it much easier to work on, understand, and with fewer places to > make path errors. > > > > Example: > > class Calendar(component.Component): > > template_string = '<span class="calendar"></span>' > > css_string = '.calendar { background: pink }' > > js_string = 'document.getElementsByClassName("calendar)[0].onclick = > function() { alert("click!") }' > > > > Seems simple enough, right? The problem is: There's no syntax > highlighting in my code editor for the three other languages. This makes > for a horrible developer experience, where you constantly have to hunt for > characters inside of strings. You saw the missing quote in js_string right? > :) > > > > If I instead use separate files, I get syntax highlighting and > auto-completion for each file, because editors set language based on file > type. But should I really have to choose? > > > > *Do we need a python language solution to this?* > > Could the code editors fix this? There's a long issue thread for vscode > where this is discussed: https://github.com/Microsoft/vscode/issues/1751 > - The reasoning (reasonable imho) is that this is not something that can be > done generally, but that it needs to be handled at the python vscode > extension level. Makes sense. > > > > Could the vscode language extension fix this? Well, the language > extension has no way to know what language it should highlight. If a string > is HTML or CSS. PyCharm has decided to use a "special python comment" # > language=html that makes the next string be highlighted in that language. > > > > So if just all editors could standardize on that comment, everything > would work? I guess so, but is that really the most intuitive API to > standardize around? If the next statement is not a string, what happens? If > the comment is on the same line as another statement, does it affect that > line, or the next? What if there's a newline in between the comment in the > string, does that work? > > > > *Suggested solution* > > I suggest supporting _tagged strings_ in python. They would look like > html'<span class="calendar"></span>'. > > * Python should not hold a list of which tagged strings it should > support, it should be possible to use any tag. > > * To avoid clashes with current raw strings and unicode strings, a tag > should be required to be at least 2 characters long (I'm open to other ways > to avoid this). > > > > I like this syntax because: > > 1. It's clear what string the tag is affecting. > > 2. It makes sense when you read it, even though you've never seen the > syntax before. > > 3. It clearly communicates which language to highlight to code editors, > since you can use the language identifiers that already exist: > https://code.visualstudio.com/docs/languages/identifiers#_known-language-identifiers > - for single letter languages, which are not supported to avoid clash with > raw strings and unicode strings, the language extension would have to > support "r-lang" and "c-lang" instead. > > 4. It mimics the syntax of tagged string templates in javascript ( > https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates). > So it has som precedent. > > > > (If desirable, I think mimicing javascript further and making tagged > strings call a function with the tag's name, would be a great addition to > Python too. This would make the syntax for parsing foreign languages much > nicer. But this is not required for my specific problem, it's just a nice > next possible step for this feature.) > > > > *Backwards compatibility* > > This syntax currently raises a invalid syntax error. So introducing this > shouldn't break existing programs. Python's currently supported string > types are just single letter, so the suggestion is to require tagged > strings to be at least two letters. > > > > *Feedback?* > > What are your thoughts on this? Do you see a value in adding tagged > strings to python? Are there other use-cases where this would be useful? > Does the suggestion need to support calling tags as functions like in > javascript to be interesting? > > > > (I'm new to python-ideas, so I hope I haven't broken some unspoken rule > with this suggestion.) > > I think this has been discussed before and rejected. > > Your need 2 things to happen > (1) a syntax change in python that is acceptable > (2) a significant editor to support syntax highlighting for that python > change. > (3) someone willing to write and support the feature in the python code > base > > Will you write and support the code? > > If the tags are called as functions then you can do it today with this: > > def html(s): > return s > > HEAD = html('<head>') > > > Barry > > > > > -- > > Emil Stenström > > _______________________________________________ > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-leave@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/OXHQHMV2JC2PY7K63VNIMSTP5T46LPKT/ > > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-leave@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/E27OL43KVTWNH7CDJ7Q7AAHF5UACMWEL/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
David Mertz, Ph.D. wrote:
My impression whenever this idea is proposed is like Barry's. The "win" isn't big enough not simply to use named functions.
Named functions solve another problem, so I don't see how this is an alternative? More on this below.
This is an argument against _any_ change to the language. I recognize this sentiment, but stopping all change in the hopes of python being simple again I don't agree with. I don't think the general python developer is there either.
This is an interesting idea. Some counter-arguments: * Anything that's hidden behind a config file won't be used except by very few. So, as you say, you need presets somehow. * Using presents for something simple like html() would render a lot of existing code differently than before this change. I don't think this i acceptable. * The idea that "when a function named X is called, the parameter should be highlighted with language X" seems complicated to implement in a code editor. * Will it apply for all arguments, just the first one, or all strings? Due to the above I think it makes more sense to tag _the string_, not the calling function.
The point here is not saving characters typed, it's tagging a string so it's easy for an editor to highlight it. For the reasons I listed above the two versions above are not equivalent.
If we imagine that your favorite editor does the same colorization inside the wrapped string either way, how are these really different?
If there was a chance this could happen, it would solve my problem nicely. For the reasons above, I don't think this will be acceptable to editors.
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
On Sat, Dec 17, 2022, 1:03 PM <emil@emilstenstrom.se> wrote:
I've been using vim long enough that I probably only edit .vimrc (or correspondingly for neovim) every week or two. I use VS Code much less, so when I do, I probably edit setting.json more like once a day (when I'm using it) But many editors in any cases, have friendly custom editors for some elements of their configs. Of course, if presets are fine, indeed users need not change them. Tagged templates do EXACTLY ZERO to make this less of a concern. If there was a chance this could happen, it would solve my problem nicely.
For the reasons above, I don't think this will be acceptable to editors.
I could trivially implement this in a few lines within every modern editor I am aware of. I bet you can do it for your editor with less than 2 hours effort.
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
Hi Barry, Your reply could easily be read as "this is a bad idea, and you shouldn't have bothered writing it down". I hope that was not your intention, and instead it comes from handling self-indulgent people expecting things from you all day. I know, I get those requests too. I'll assume that was not your intention in my answers below. Barry Scott wrote:
I think this has been discussed before and rejected.
Do you have a link to that discussion, or is this just from memory? What should I search for to find this discussion? Why was it rejected?
I understand all these 3 things are needed. I'm saying that I think this feature is worth it. Do you mean I should do things in a separate order? We are in the idea stage, before a (1) strict syntax can be suggested.
Will you write and support the code?
Is commiting to write the code a requirement to suggest an idea? Or course this is required down the line, but let's see if this is a good idea first?
If I'm not missing anything, this doesn't help with syntax highlighting? Highlighting is the problem I'm talking about in my post above. Regards, Emil
data:image/s3,"s3://crabby-images/4d484/4d484377daa18e9172106d4beee4707c95dab2b3" alt=""
On Sat, Dec 17, 2022 at 9:43 AM <emil@emilstenstrom.se> wrote:
Barry Scott wrote:
Try googling "python-ideas string prefixes". Doing mimimal diligence is a reasonable expectation before writing up an idea.
Not true. A syntax highlighter can certainly recognize html('...') just as it can recognize html'...'. --- Bruce
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
Bruce Leban wrote:
Try googling "python-ideas string prefixes". Doing mimimal diligence is a reasonable expectation before writing up an idea.
Thanks for the query "string prefixes". I tried other queries but not that one. I ended my first message with "I hope I didn't break any unspoken rules" and it seems I have.
I replied to this in a separate post, but html() is likely a function name that is used in millions of existing code bases. Applying this rule to all of them will lead to too many errors to be acceptable to editors I think. And if this has to be explicitly configured in an editor very few will use it.
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
For reference: This thread has a much deeper discussion of this idea: https://discuss.python.org/t/allow-for-arbitrary-string-prefix-of-strings/19... I'll continue the discussion there instead.
data:image/s3,"s3://crabby-images/4d484/4d484377daa18e9172106d4beee4707c95dab2b3" alt=""
On Sat, Dec 17, 2022 at 10:10 AM <emil@emilstenstrom.se> wrote:
Understood. This string suffix syntax is supported by Python today and syntax highlighters could be modified to support this without requiring changes to any other component. class Calendar(component.Component): template_string = '<span class="calendar"></span>' ##html css_string = '.calendar { background: pink }' ##css js_string = 'document.getElementsByClassName("calendar")[0].onclick = function() { alert("click!") }' ##javascript --- Bruce
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
On Sat, Dec 17, 2022, at 19:20, Bruce Leban wrote:
PyCharm supports syntax similar to this. They put a # language=html on the line in front of the string. I think this is messy for the reasons in my original post, but maybe this is the only reasonable way forward. I'll see if I can ask the vscode python language extension team what they think. Nice to see you fixed the syntax error in the js too! :)
data:image/s3,"s3://crabby-images/0a0b8/0a0b89b5ba07f1189b0dda490d64cc1178193761" alt=""
My two cents (speaking as long-term observer, not as the moderator, or perhaps in addition to the moderator ;) - I think your ask was appropriate, and I think the response of “here’s the search you should do!” was great. Personally I think we could do without the implication that you should have done more due diligence. python-ideas is PRECISELY for this kind of question. Other forums should have a higher barrier to entry (like python-dev), but not python-ideas. best, —titus
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
Jim Baker has been working on tagged strings, and Guido has a working implementation. See https://github.com/jimbaker/tagstr/issues/1 I thought Jim had a draft PEP on this somewhere, but I can’t find it. -- Eric
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
Just to be clear on my opinion. I think Emil's idea was 100% appropriate to share on python-ideas, and he does a good job of showing where it works be useful. Sure, a background search is nice, but not required. That doesn't mean I *support* the idea. I take a very conservative attitude towards language changes. I hope I've provided okay explanation of my non-support, but it's NOT a criticism of Emil in any way. That said, Jim Baker pitched his similar idea to my at last PyCon, and I remember coming closer to feeling supportive. Maybe partially just because I know and like Jim for a long time. But I think he was also suggesting some extra semantics that seemed to move the needle in my mind. On Sat, Dec 17, 2022, 2:27 PM Eric V. Smith via Python-ideas < python-ideas@python.org> wrote:
data:image/s3,"s3://crabby-images/df4b3/df4b368ca47542a64f37c0d860aca7fa0d40e95b" alt=""
On 18/12/2022 05.07, emil@emilstenstrom.se wrote:
Seems simple enough, right? The problem is: There's no syntax highlighting in my code editor for the three other languages. This makes for a horrible developer experience, where you constantly have to hunt for characters inside of strings. You saw the missing quote in js_string right? :)
Is this a problem with Python, or with the tool? « Language injections Last modified: 14 December 2022 Language injections let you work with pieces of code in other languages embedded in your code. When you inject a language (such as HTML, CSS, XML, RegExp, and so on) into a string literal, you get comprehensive code assistance for editing that literal. ... » https://www.jetbrains.com/help/pycharm/using-language-injections.html Contains a specific example for Django scripters. (sadly as an image - probably wouldn't be handled by this ListServer)
If I instead use separate files, I get syntax highlighting and auto-completion for each file, because editors set language based on file type. But should I really have to choose?
In other situations where files need to be collected together, a data-archive may be used (not to be confused with any historical context, nor indeed with data-compression). Might a wrapper around such of PSL's services help to both keep everything together, and yet enable separate editing format-recognition? « Data Compression and Archiving The modules described in this chapter support data compression with the zlib, gzip, bzip2 and lzma algorithms, and the creation of ZIP- and tar-format archives. ... » https://docs.python.org/3/library/archiving.html Disclaimer: JetBrains sponsors our PUG with monthly prizes, eg PyCharm. -- Regards, =dn
data:image/s3,"s3://crabby-images/48536/48536af3353184ecbcd4d8e6c89cf59586c7bf40" alt=""
dn wrote:
I touched upon this solution in the original post. If all editors could agree to use # language=html it would be an ok solution. That API creates lots of ambiguity around to what the comment should be applied. Some examples which are non-obvious imho: ------------ "<div>" # language=html "<span> ------------ # language=html "<div>" ------------ # language=html process_html("<html>") ------------ # language=html concat_html("<html>", "<span>") ------------
The point here is to have everything in one file, editable and syntax highlighted in that same file. I don't think this tip applies to that?
data:image/s3,"s3://crabby-images/437f2/437f272b4431eff84163c664f9cf0d7ba63c3b32" alt=""
emil@emilstenstrom.se writes:
Seems simple enough, right? The problem is: There's no syntax highlighting in my code editor for the three other languages.
Then you're not using Emacs's mmm-mode, which has been available for a couple of decades. Now, mmm-mode doesn't solve the whole problem -- it doesn't know anything about how the languages are tagged. But this isn't a problem for an Emacs shop, the team decides on a convention (or recognizes a third party's convention), and somebody will code up the 5-line function that font-lock (syntax highlighter in Emacs) uses to dispatch to the appropriate the syntax highlighting mode. AFAICS this requires either all editors become Emacs ;-) or all editor maintainers get together and agree on the tags (this will need to be extensible, there are a lot of languages out there, and some editors will want to distinguish languages by version to flag syntax invalid in older versions). Is this really going to happen? Just for Python? When the traditional solution of separating different languages into different files is almost always acceptable? There are other uses proposed for tagged strings. In combination, perhaps this feature is worthwhile. But I think that on its own the multiple language highlighting application is pretty dubious given the limited benefit vs. the amount of complexity it will introduce not only in Python, but in editors as well.
This makes for a horrible developer experience, where you constantly have to hunt for characters inside of strings.
If this were a feature anyway, it would be very useful in certain situations (for example dynamic web pages), no question about it. But mixed-language files are not something I want to see in projects I work on -- and remember, I use Emacs, I have mmm-mode already.
This is problematic for your case. This means that the editor needs to change how it dispatches to syntax highlighting. Emacs, no problem, it already dispatches highlighting based on tagged regions of text. But are other editors going to *change* to do that?
But should I really have to choose?
Most of the time, I'd say "yes", and you should choose multiple files. ;-) YMMV of course, but I really appreciate the separation of concerns that is provided by separate files for Python code, HTML templates, and (S)CSS presentation.
Makes sense, yes -- that's how Emacs does it, but Emacs is *already* fundamentally designed on a model of implicitly tagged text. Parsing strings is already relatively hard because the begin marker is the same as the end marker. Now you need to tie it to the syntax highlighting mode, which may change over large regions of text every time you insert or delete a quotation mark or comment delimiter. You *can't* just hand it off to the Python highlighter, *every* syntax highlighter that might be used inside a Python string at least needs to know how to hand control back to Python. For one thing, they all need to learn about all four of Python's string delimiters. And it gets worse. I wonder how you end up with CSS and HTML inside Python strings? Yup, the CSS is inside a <style> element inside the HTML inside the Python string which may end in any of four different ways. It's not good enough to add this to the Python highlighter.... Even if Python gets tagged strings, my bet is that the odds are quite bad that any given editor ever supports this application of them. I wouldn't wish this on the devs of any editor except Emacs, which has had it since the late 1990s. Isn't it easier for you to just use Emacs? ;-) Steve
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
Well, obviously I have to come to the defense of vim as well :-). I'm not sure what year vim got the capability, but I suspect around as long as emacs. This isn't for exactly the same language use case, but finding a quick example on the internet: unlet b:current_syntaxsyntax include @srcBash syntax/bash.vim syntax region srcBashHi start="..." end="..." keepend contains=@srcBash unlet b:current_syntaxsyntax include @srcHTML syntax/html.vim syntax region srcHTMLHi start="^...$" end="^...$" keepend contains=@srcHTML This is easy to adapt to either the named function convention: `html('<h1>Hello</h1>')` or to the standardized-comment convention. In general, I find any proposal to change Python "because then my text editor would need to change to accommodate the language" to be unconvincing.
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Personally, I’m skeptical of any proposal to change Python to make it easier for IDEs. But there *may* be other good reasons to do something like this. I’m not a static typing guy, but it segg do me that it could be useful to subtype strings: This function expects an SQL string. This function returns an SQL string. Maybe not worth the overhead, but worth more than giving IDEs hints SATO what to do. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
Using a typing approach sounds like a fantastic idea. Moreover, as Stephen showed, it's easy to make Emacs utilize that, and as I showed, it's easy to make vim follow that. I've only written one tiny VS Code extension, but it wouldn't be hard there either. I'm not sure how one adds stuff to PyCharm and other editors, but I have to believe it's possible. So I see two obvious approaches, both of which 100% fulfill Emil's hope without new syntax: #1 from typing import NewType html = NewType("html", str) css = NewType("css", str) a: html = html("<h1>Hello world</h1>") b: css = css("h1 { color: #999999; }") def combine(h: html, c: css): print(f"Combined page elements: {h} | {c}") combine(a, b) # <- good combine(b, a) # <- bad However, if you want to allow these types to possibly *do* something with the strings inside (validate them, canonicalize them, do a security check, etc), I think I like the other way: #2 class html(str): pass class css(str): pass a: html = html("<h1>Hello world</h1>") b: css = css("h1 { color: #999999; }") def combine(h: html, c: css): print(f"Combined page elements: {h} | {c}") combine(a, b) combine(b, a) The type annotations in the assignment lines are optional, but if you're doing something other than just creating an instance of the (pseudo-)type, they might add something. They might also be what your text editor decides to use as its marker. For either version, type analysis will find a problem. If I hadn't matched the types in the assignment, it would detect extra problems: (py3.11) 1310-scratch % mypy tagged_types1.py tagged_types1.py:13: error: Argument 1 to "combine" has incompatible type "css"; expected "html" [arg-type] tagged_types1.py:13: error: Argument 2 to "combine" has incompatible type "html"; expected "css" [arg-type] Found 2 errors in 1 file (checked 1 source file) Using typing.Annotated can also be used, but it solves a slightly different problem. On Sun, Dec 18, 2022 at 5:24 PM Paul Moore <p.f.moore@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Dec 18, 2022 at 07:38:06PM -0500, David Mertz, Ph.D. wrote:
The problem with this is that the builtins are positively hostile to subclassing. The issue is demonstrated with this toy example: class mystr(str): def method(self): return 1234 s = mystr("hello") print(s.method()) # This is fine. print(s.upper().method()) # This is not. To be useable, we have to override every string method that returns a string. Including dunders. So your class becomes full of tedious boiler plate: def upper(self): return type(self)(super().upper()) def lower(self): return type(self)(super().lower()) def casefold(self): return type(self)(super().casefold()) # Plus another 29 or so methods This is not just tedious and error-prone, but it is inefficient: calling super returns a regular string, which then has to be copied as a subclassed string and the original garbage collected. -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, 19 Dec 2022 at 12:29, Steven D'Aprano <steve@pearwood.info> wrote:
"Hostile"? I dispute that. Are you saying that every method on a string has to return something of the same type as self, rather than a vanilla string? Because that would be far MORE hostile to other types of string subclass:
Demo.x is a string. Which means that, unless there's good reason to do otherwise, it should behave as a string. So it should be possible to use it as if it were the string "eggs", including appending it to something, appending something to it, uppercasing it, etc, etc, etc. So what should happen if you do these kinds of manipulations? Should attempting to use a string in a normal string context raise ValueError?
I would say that *that* would count as "positively hostile to subclassing". ChrisA
data:image/s3,"s3://crabby-images/14ef3/14ef3ec4652919acc6d3c6a3c07f490f64f398ea" alt=""
On Sun, Dec 18, 2022 at 8:29 PM Steven D'Aprano <steve@pearwood.info> wrote:
I'd agree to "limited", but not "hostile." Look at the suggestions I mentioned: validate, canoncialize, security check. All of those are perfectly fine in `.__new__()`. E.g.: In [1]: class html(str): ...: def __new__(cls, s): ...: if not "<" in s: ...: raise ValueError("That doesn't look like HTML") ...: return str.__new__(cls, s) In [2]: html("<h1>Hello</h1>") In [3]: html("Hello") --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-3-71d16160c9ad> in <module> ----> 1 html("Hello") <ipython-input-1-e9d5da1202f3> in __new__(cls, s) 2 def __new__(cls, s): 3 if not "<" in s: ----> 4 raise ValueError("That doesn't look like HTML") 5 ValueError: That doesn't look like HTML I readily acknowledge that's not a very thorough validator :-). But this much (say with a better validator) gets you static type checking, syntax highlighting, and inherent documentation of intent. I know that lots of things one can do with a str subclass wind up producing a str instead. But if the thing you do is just "make sure it is created as the right kind of thing for static checking and editor assistance, I don't care about any of that falling back. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Dec 18, 2022 at 10:23:18PM -0500, David Mertz, Ph.D. wrote:
No, they aren't perfectly fine, because as soon as you apply any operation to your string subclass, you get back a plain vanilla string which bypasses your custom `__new__` and so does not perform the validation or security check.
But this much (say with a better validator) gets you static type checking, syntax highlighting, and inherent documentation of intent.
Any half-way decent static type-checker will immediately fail as soon as you call a method on this html string, because it will know that the method returns a vanilla string, not a html string. And that's exactly what mypy does: [steve ~]$ cat static_check_test.py class html(str): pass def func(s:html) -> None: pass func(html('').lower()) [steve ~]$ mypy static_check_test.py static_check_test.py:7: error: Argument 1 to "func" has incompatible type "str"; expected "html" Found 1 error in 1 file (checked 1 source file) Same with auto-completion. Either auto-complete will correctly show you that what you thought was a html object isn't, and fail to show any additional methods you added; or worse, it will wrongly think it is a html object when it isn't, and allow you to autocorrect methods that don't exist.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano <steve@pearwood.info> wrote:
But what does it even mean to uppercase an HTML string? Unless you define that operation specifically, the most logical meaning is "convert it into a plain string, and uppercase that". Or, similarly, slicing an HTML string. You could give that a completely different meaning (maybe defining its children to be tags, and slicing is taking a selection of those), but if you don't, slicing isn't really a meaningful operation. So it should be correct: you cannot simply uppercase an HTML string and expect sane HTML. I might be more sympathetic if you were talking about "tainted" strings (ie those which contain data from an end user), on the basis that most operations on those should yield tainted strings, but given that systems of taint tracking seem to have managed just fine with the existing way of doing things, still not particularly persuasive. ChrisA
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 19Dec2022 22:45, Chris Angelico <rosuav@gmail.com> wrote:
Yes, this was my thought. I've got a few subclasses of builtin types. They are not painless. For HTML "uppercase" is a kind of ok notion because the tags are case insensitive. Notthe case with, say, XML - my personal nagging example is from KML (Google map markup dialect) where IIRC a "ScreenOverlay" and a "screenoverlay" both existing with different semantics. Ugh. So indeed, I'd probably _want_ .upper to return a plain string and have special methods to do more targetted things as appropriate. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, 21 Dec 2022 at 09:30, Cameron Simpson <cs@cskk.id.au> wrote:
Tag names are, but their attributes might not be, so even that might not be safe.
Ugh indeed. Why? Why? Why?
So indeed, I'd probably _want_ .upper to return a plain string and have special methods to do more targetted things as appropriate.
Agreed. ChrisA
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
As has been said, a builtin *could* be written that would be "friendly to subclassing", by the definition in this thread. (I'll stay out of the argument for the moment as to whether that would be better) I suspect that the reason str acts like it does is that it was originally written a LONG time ago, when you couldn't subclass basic built in types at all. Secondarily, it could be a performance tweak -- minimal memory and peak performance are pretty critical for strings. But collections.UserString does exist -- so if you want to subclass, and performance isn't critical, then use that. Steven A pointed out that UserStrings are not instances of str though. I think THAT is a bug. And it's probably that way because with the magic of duck typing, no one cared -- but with all the static type hinting going on now, that is a bigger liability than it used to be. Also basue when it was written, you couldn't subclass str. Though I will note that run-time type checking of string is relatively common compared to other types, due to the whole a-str-is-a-sequence-of-str issue making the distinction between a sequence of strings and a string itself is sometimes needed. And str is rarely duck typed. If anyone actually has a real need for this I'd post an issue -- it'd be interesting if the core devs see this as a bug or a feature (well, probably not feature, but maybe missing feature) OK -- I got distracted and tried it out -- it was pretty easy to update UserString to be a subclass of str. I suspect it isn't done that way now because it was originally written because you could not subclass str -- so it stored an internal str instead. The really hacky part of my prototype is this: # self.data is the original attribute for storing the string internally. Partly to prevent my having to re-write all the other methods, and partly because you get recursion if you try to use the methods on self when overriding them ... @property def data(self): return "".join(self) The "".join is because it was the only way I quickly thought of to make a native string without invoking the __str__ method and other initialization machinery. I wonder if there is another way? Certainly there is in C, but in pure Python? Anyway, after I did that and wrote a __new__ -- the rest of it "just worked". def __new__(cls, s): return super().__new__(cls, s) UserString and its subclasses return instances of themselves, and instances are instances of str. Code with a couple asserts in the __main__ block enclosed. Enjoy! -CHB NOTE: VERY minimally tested :-) On Tue, Dec 20, 2022 at 4:17 PM Chris Angelico <rosuav@gmail.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/2f884/2f884aef3ade483ef3f4b83e3a648e8cbd09bb76" alt=""
On Tue, Dec 20, 2022 at 5:38 PM Christopher Barker <pythonchb@gmail.com> wrote:
Note that UserString does break some built-in functionality, like you can't apply regular expressions to a UserString:
There is more discussion in this thread ( https://stackoverflow.com/questions/59756050/python3-when-userstring-does-no...), including a link to a very old bug (https://bugs.python.org/issue232493). There is a related issue with json.dump etc, though it can be worked around since there is a python-only json implementation. I have run into this in practice at a previous job, with a runtime "taint" tracker for logging access to certain database fields in a Django application. Many views would select all fields from a table, then not actually use the fields I needed to log access to, which generated false positives. (Obviously the "correct" design is to only select data that is relevant for the given code, but I was instrumenting a legacy codebase with updated compliance requirements.) So I think there is some legitimate use for this, though object proxies can be made to work around most of the issues. - Lucas
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Tue, Dec 20, 2022 at 6:20 PM Lucas Wiman <lucas.wiman@gmail.com> wrote:
I wonder how many of these issues would go away if userString subclassed for str. Maybe some? But at the C level, duck typing simply doesn't work -- you need access to an actual C string struct. Code that worked with strings *could* have a little bit of wrapper for subclasses that would dig into it to find the actual str underneath -- but if that code had to be written everywhere strings are used in C -- that could be a pretty big project -- probably what Guido meant by: "Fixing this will be a major project, probably for Python 3000k" I don't suppose it has been addressed at all? Note: at least for string paths, the builtins all use fspath() (or something) so that should be easy to make work. (and seems to with my prototype already) There is a related issue with json.dump etc, json.dump works with my prototype as well. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Tue, Dec 20, 2022 at 8:21 PM Stephen J. Turnbull <stephenjturnbull
UserStrings are not instances of str though. I think THAT is a bug.
I guess, although surely the authors of that class thought about it.
Well, kind of — the entire reason for UserString was that at the time, str itself could not be subclassed. So it was certainly a feature at the time ;-) The question is whether anyone thought about it again later, and the docs seem to indicate not: UserString <https://docs.python.org/3/library/collections.html?highlight=userstring#coll...> objects The class, UserString <https://docs.python.org/3/library/collections.html?highlight=userstring#coll...> acts as a wrapper around string objects. The need for this class has been partially supplanted by the ability to subclass directly from str <https://docs.python.org/3/library/stdtypes.html#str>; however, this class can be easier to work with because the underlying string is accessible as an attribute. And it has no docstrings at all -- it doesn't strike me that anyone is putting any thought into carefully maintaining it. Anyway, this could probably be improved with a StringLike ABC I'm not so sure -- in many cases, the underlying C implementation is critical -- and strings are one of those things that generally aren't duck-typed -- subclassing is a special case of that. Anyway -- I've only gotten this far 'cause it caught my interest -- but I have no need for subclassing strings -- but if someone does, I think it would be worth at least bringing up with the core devs. -CHB
data:image/s3,"s3://crabby-images/21dda/21dda586b6b15305a5f5404123c2ec1fe76ef4a1" alt=""
That's interesting, for me both 3.9 and 3.10 show the f-string more than 5x faster. This is just timeit on f'{myvar}' vs ''.join((myvar,)) so it may not be the most nuanced comparison for a class property. Probably unsurprisingly having myvar be precomputed as the single tuple also gives speedups, around 45% for me. So if just speed is wanted maybe inject the tuple pre-constructed. ~ Jeremiah On Wed, Dec 21, 2022 at 1:19 AM Steven D'Aprano <steve@pearwood.info> wrote:
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Wed, Dec 21, 2022 at 8:34 AM Jeremiah Paige <ucodery@gmail.com> wrote:
That may be the optimization that 3.11 is doing for you :-) Now that I think about it, if this is immutable, which it should be, as it's a str subclass, then perhaps the data string can be pre-computed, as it was in the original. I liked the property, as philosophically, you don't want to store the same data twice, but with an immutable, there should be no danger of it getting out of sync, and it would be faster. (though memory intensive for large strings). -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Wed, Dec 21, 2022 at 1:18 AM Steven D'Aprano <steve@pearwood.info> wrote:
I think both of those will call self.__str__, which creates a recursion -- that's what I'm trying to avoid. I'm sure there are ways to optimize this -- but only worth doing if it's worth doing at all :-) - CHB That's about 14% faster than the f-string version.
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Thu, 22 Dec 2022 at 03:41, Christopher Barker <pythonchb@gmail.com> wrote:
Second one doesn't seem to.
Interestingly, neither does the f-string, *if* you include a format code with lots of room. I guess str.__format__ doesn't always call __str__().
Curiouser and curiouser. Especially since the returned strings aren't enclosed in quotes. Let's try something.
Huh. How about that. ChrisA
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Wed, Dec 21, 2022 at 8:54 AM Chris Angelico <rosuav@gmail.com> wrote:
hmm -- interesting trick -- I had jumped to that conclusion -- I wonder what it IS using under the hood? Interestingly, neither does the f-string, *if* you include a format
code with lots of room. I guess str.__format__ doesn't always call __str__().
Now that you mention that, UserString should perhaps have a __format__, More evidence that it's not really being maintained. Though maybe not -- perhaps the inherited one will be fine. Now that I think about it, perhaps the inherited __str__ would be fine as well. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Thu, 22 Dec 2022 at 04:14, Christopher Barker <pythonchb@gmail.com> wrote:
From the look of things, PyUnicode_Join (the internal function that handles str.join()) uses a lot of "reaching into the data structure" operations for efficiency. It uses PyUnicode_Check (aka "isinstance(x, str)") rather than PyUnicode_CheckExact (aka "type(x) is str") and then proceeds to cast the pointer and directly inspect its members. As such, I don't think UserString can ever truly be a str, and it'll never work with str.join(). The best you'd ever get would be explicitly mapping str over everything first:
And we don't want that to be the default, since we're not writing JavaScript code here. ChrisA
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Wed, Dec 21, 2022 at 9:35 AM Chris Angelico <rosuav@gmail.com> wrote:
I had figured subclasses of str wouldn’t be full players in the C code — but join() us pretty fundamental:-( -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
I am not enthusiastic about this idea at all: as I perceive it it is an IDE problem, external to the language, and should be resolved there - maybe with a recommendation PEP. But on the other hand, I had seem tens of e-mails discussing string-subclassing, so that annotations could suffice as a hint to inner-string highlighting - and then: subclassing is not really needed at all: Maybe we can allow string tagging in annotations by using `str['html']`, "str['css']" and so on. (the typing module even could take no-op names such as "html", "css", etc... to mean those without any other signs, so stuff could be annotated like `template: html = "xxxx"` which the the same typing machinery that makes things like `TypedDict`. `Required`, etc... work would present these as plain "str" to the runtime, while allowing any tooling to perceive it as a specialized class. In other words, one could then either write: mytemplate: str['html'] = "<html> ....</html>" Or from typing import html mytemplate: html = ... (the former way could be used for arbitrary tagging as proposed by the O.P. , and it would be trivial to add a "register" function to declaratively create new tags at static-analysis time. This syntax has the benefits that static type checkers can take full-beneffit of the string subtypes, correctly pointing out when a "CSS" string is passed as an argument that should contain "HTML", with no drawbacks, no syntax changes, and no backwards compatibility breaks. On Thu, Dec 22, 2022 at 1:42 AM Christopher Barker <pythonchb@gmail.com> wrote:
participants (19)
-
Barry Scott
-
Bruce Leban
-
C. Titus Brown
-
Cameron Simpson
-
Chris Angelico
-
Christopher Barker
-
David Mertz, Ph.D.
-
dn
-
Emil Stenström
-
emil@emilstenstrom.se
-
Eric V. Smith
-
Jeremiah Paige
-
Joao S. O. Bueno
-
Lucas Wiman
-
Paul Moore
-
Rob Cliffe
-
Shantanu Jain
-
Stephen J. Turnbull
-
Steven D'Aprano