[Bug 1779445] [NEW] edithtml.py saves en templates using html entity reference with raw iso-8859-1 character
Public bug reported: In Mailman's web administrative interface, edithtml page saves en language templates by using iso-8859-1 raw character if the template uses html entity reference like " ". For example, If "General list information page" (templates/en/listinfo.html), which contains ", has been saved without modification from web UI, the lists template en/listinfo.html will contain raw '\xa0' character. If Adding "<!-- ©® --> in text area and submit changes twice, it will turn into "<!-- \xa9\xae -->". I'm not sure the patch attached is a good way to fix it because I don't know these entity reference characters are always ISO-8859-1 character, but for reference. ** Affects: mailman Importance: Undecided Status: New ** Patch added: "edithtml-save-as-ascii-patch.txt" https://bugs.launchpad.net/bugs/1779445/+attachment/5158043/+files/edithtml-... ** Branch linked: lp:mailman/2.1 -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1779445 Title: edithtml.py saves en templates using html entity reference with raw iso-8859-1 character To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1779445/+subscriptions
Actually, this behavior was caused by rev. 1188. Unfortunately, I don't recall specifically why I made that change. I will attach a patch of what I have so far. Because the call to websafe comes from htmlformat.TextArea(), I need more testing to see if the other uses of TextArea are adversely impacted. ** Changed in: mailman Importance: Undecided => Medium ** Changed in: mailman Status: New => In Progress ** Changed in: mailman Milestone: None => 2.1.28 ** Changed in: mailman Assignee: (unassigned) => Mark Sapiro (msapiro) -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1779445 Title: edithtml.py saves en templates using html entity reference with raw iso-8859-1 character To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1779445/+subscriptions
** Patch added: "Possible fix." https://bugs.launchpad.net/mailman/+bug/1779445/+attachment/5159349/+files/1... -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1779445 Title: edithtml.py saves en templates using html entity reference with raw iso-8859-1 character To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1779445/+subscriptions
** Patch removed: "Possible fix." https://bugs.launchpad.net/mailman/+bug/1779445/+attachment/5159349/+files/1... -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1779445 Title: edithtml.py saves en templates using html entity reference with raw iso-8859-1 character To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1779445/+subscriptions
Revised possible fix patch. I think the main reason for not double escaping HTML entities was to make HTML text displayed in the admindb interface more readable. This patch will avoid double escaping only in readonly TextArea. ** Patch added: "Possible fix." https://bugs.launchpad.net/mailman/+bug/1779445/+attachment/5159363/+files/1... -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1779445 Title: edithtml.py saves en templates using html entity reference with raw iso-8859-1 character To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1779445/+subscriptions
I understand that your fix is to preserve character entity reference in the text of TextArea through the post method and I made sure it have been fixed in Rev 1788. Thank you. I think one more problem about charset of query strings from Text or TextArea which is not restricted to ascii text for all language. If a text contains raw non-ascii character, its charset depends on implementation of browsers, even if the HTML 4.01 specification mentions its default is "UNKNOWN", which means "User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element." (https://www.w3.org/TR/html401/interact/forms.html) It seems that it is not a problem in most case on browsers nowadays respecting the specification, but it is still problem in some case. At least I put into non-breaking space ('\xa0' in iso-8859-1) character in Text field in us-ascii form using Firefox 61 on FreeBSD, it encoded as '%A0' in query string although characters in Unicode are encoded as numeric character references. The code to handle this special care for 'us-ascii' is found in Utils.canonstr(), so it may be needed to use it in some place including TextArea in edithtml.py (Though using non-ascii characters in us-ascii form is irregular, of course) -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1779445 Title: edithtml.py saves en templates using html entity reference with raw iso-8859-1 character To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1779445/+subscriptions
I think the issue in the original description is fixed and that described in comment #5 is a different issue. If you think this is a significant issue that needs to be fixed, please open a new bug for it. -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1779445 Title: edithtml.py saves en templates using html entity reference with raw iso-8859-1 character To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1779445/+subscriptions
I don't think it is a significant, as I mentioned comment #5 in last sentence within the ()'s. So I won't open a bug for it. I'm sorry to bother you. -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1779445 Title: edithtml.py saves en templates using html entity reference with raw iso-8859-1 character To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1779445/+subscriptions
** Changed in: mailman Status: In Progress => Fix Released -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1779445 Title: edithtml.py saves en templates using html entity reference with raw iso-8859-1 character To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1779445/+subscriptions
participants (2)
-
Mark Sapiro
-
Yasuhito FUTATSUKI@POEM