Mailman 3 [Twisted-Python] IDNA problem in twisted - Twisted

newer
[Twisted-Python] startTLS errors...

[Twisted-Python] IDNA problem in twisted

older
[Twisted-Python] Weekly Bug Summary

Barry Scott

April 8, 2021

3:43 p.m.

We just added a patch to our twisted to prevent twisted from doing idna validation. _idnaBytes and _idnaText not convert from bytes to unicode based on the type of the provided arg. We had to do this because there are domain names that youtube.com uses that are not valid under IDNA-2008 https://tools.ietf.org/html/rfc5891#section-4.2.3.1 For example this URL: https://r2---sn-aigzrn7e.googlevideo.com/generate_204 Firefox is happy to visit this URL and does not change it when its enter in the address bar. The comment in the _idna.py code that say this: "Convert some text typed by a human into some ASCII bytes." and "Convert some IDNA-encoded octets into some human-readable text" The key idea here is that its human input that will be converted. But the code is used deep in the _sslverify.py where no human input is entered. I can see why a UI would need to do IDNA-2008 converts and validation but I'm not clear why its of value deep in the guts of twisted. Why is this code needed at all in twisted? If its for a high level API then why isn't it being called at the edge of the high level API calls? Barry

Show replies by date

Wim Lewis

April 2021

3:58 a.m.

On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:

...

We just added a patch to our twisted to prevent twisted from doing idna validation. _idnaBytes and _idnaText not convert from bytes to unicode based on the type of the provided arg.

We had to do this because there are domain names that youtube.com uses that are not valid under IDNA-2008 https://tools.ietf.org/html/rfc5891#section-4.2.3.1

My reading of the RFC is that the YouTube domain you mention (r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that doesn't mean it's an entirely invaid domain label. It just means you can't legally run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The intent, as I understand it, is to forbid any possibility of double-encoding or double-decoding a label, not to forbid the possibility of using labels like the one you mention.

...

I can see why a UI would need to do IDNA-2008 converts and validation but I'm not clear why its of value deep in the guts of twisted.

My guess is that this is just an accident of the way that the bytes/characters distinction and the IDNA features were added to Twisted, and is probably a bug.

...

Why is this code needed at all in twisted? If its for a high level API then why isn't it being called at the edge of the high level API calls?

I'd argue that resolving URLs is in fact a high level API (from the point of view of the name resoution system) but even so, it seems to me that Twisted is doing the wrong thing here. The format of that label should prevent it from ever being transformed by IDNA, but shouldn't prevent it from being passed through unchanged, since it doesn't contain any codepoints outside of the usual ASCII range.

...

The key idea here is that its human input that will be converted. But the code is used deep in the _sslverify.py where no human input is entered.

_sslverify has to check whether the information in the server's certificate matches the URL that the user supplied. Certificates can contain Unicode text — at least in the (completely obsolete) CN-as-domain-name situation — so _sslverify probably picked up the requirement for IDNA transformations from that. (I don't remember whether dNSName SANs can contain unicode.) What is the patch you decided to add to your version? Where in _sslverify did the problem surface?

Glyph

5:52 a.m.

...

On Apr 27, 2021, at 8:58 PM, Wim Lewis <wiml@hhhh.org> wrote:

On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:

...
We just added a patch to our twisted to prevent twisted from doing idna validation. _idnaBytes and _idnaText not convert from bytes to unicode based on the type of the provided arg.

We had to do this because there are domain names that youtube.com uses that are not valid under IDNA-2008 https://tools.ietf.org/html/rfc5891#section-4.2.3.1

My reading of the RFC is that the YouTube domain you mention (r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that doesn't mean it's an entirely invaid domain label. It just means you can't legally run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The intent, as I understand it, is to forbid any possibility of double-encoding or double-decoding a label, not to forbid the possibility of using labels like the one you mention.

I agree with this reading.

...

...
I can see why a UI would need to do IDNA-2008 converts and validation but I'm not clear why its of value deep in the guts of twisted.

My guess is that this is just an accident of the way that the bytes/characters distinction and the IDNA features were added to Twisted, and is probably a bug.

+1. We also have other issues with the Python IDNA library: https://github.com/kjd/idna/issues/18 <https://github.com/kjd/idna/issues/18> and would generally like to reduce our strictness via whatever mechanisms we can, even for things that genuinely require it (which this does not).

...

...
Why is this code needed at all in twisted? If its for a high level API then why isn't it being called at the edge of the high level API calls?

I'd argue that resolving URLs is in fact a high level API (from the point of view of the name resoution system) but even so, it seems to me that Twisted is doing the wrong thing here. The format of that label should prevent it from ever being transformed by IDNA, but shouldn't prevent it from being passed through unchanged, since it doesn't contain any codepoints outside of the usual ASCII range.

Also agreed with all of this.

...

...
The key idea here is that its human input that will be converted. But the code is used deep in the _sslverify.py where no human input is entered.

_sslverify has to check whether the information in the server's certificate matches the URL that the user supplied. Certificates can contain Unicode text — at least in the (completely obsolete) CN-as-domain-name situation — so _sslverify probably picked up the requirement for IDNA transformations from that. (I don't remember whether dNSName SANs can contain unicode.)

Yep.

...

What is the patch you decided to add to your version? Where in _sslverify did the problem surface?

I am also very curious about this :).

Barry Scott

9:22 a.m.

On Wednesday, 28 April 2021 06:52:30 BST Glyph wrote:

...

...
On Apr 27, 2021, at 8:58 PM, Wim Lewis <wiml@hhhh.org> wrote:

On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:

...
We just added a patch to our twisted to prevent twisted from doing idna validation. _idnaBytes and _idnaText not convert from bytes to unicode based on the type of the provided arg.

We had to do this because there are domain names that youtube.com uses that are not valid under IDNA-2008 https://tools.ietf.org/html/rfc5891#section-4.2.3.1

My reading of the RFC is that the YouTube domain you mention (r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that doesn't mean it's an entirely invaid domain label. It just means you can't legally run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The intent, as I understand it, is to forbid any possibility of double-encoding or double-decoding a label, not to forbid the possibility of using labels like the one you mention.

I agree with this reading.

...
...
I can see why a UI would need to do IDNA-2008 converts and validation but I'm not clear why its of value deep in the guts of twisted.

My guess is that this is just an accident of the way that the bytes/characters distinction and the IDNA features were added to Twisted, and is probably a bug.

+1.

We also have other issues with the Python IDNA library: https://github.com/kjd/idna/issues/18 <https://github.com/kjd/idna/issues/18> and would generally like to reduce our strictness via whatever mechanisms we can, even for things that genuinely require it (which this does not).

...
...
Why is this code needed at all in twisted? If its for a high level API then why isn't it being called at the edge of the high level API calls?

I'd argue that resolving URLs is in fact a high level API (from the point of view of the name resoution system) but even so, it seems to me that Twisted is doing the wrong thing here. The format of that label should prevent it from ever being transformed by IDNA, but shouldn't prevent it from being passed through unchanged, since it doesn't contain any codepoints outside of the usual ASCII range.

Also agreed with all of this.

...
...
The key idea here is that its human input that will be converted. But the code is used deep in the _sslverify.py where no human input is entered.

_sslverify has to check whether the information in the server's certificate matches the URL that the user supplied. Certificates can contain Unicode text — at least in the (completely obsolete) CN-as-domain-name situation — so _sslverify probably picked up the requirement for IDNA transformations from that. (I don't remember whether dNSName SANs can contain unicode.)

Yep.

...
What is the patch you decided to add to your version? Where in _sslverify did the problem surface?

When _idaBytes was called to raise an exception in ClientTLSOptions.__init__.

...

I am also very curious about this :).

Attached is the patch we are using. We are using 19.07 for sad reasons. Barry

Wim Lewis

April 2021

3:58 a.m.

On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:

...

We just added a patch to our twisted to prevent twisted from doing idna validation. _idnaBytes and _idnaText not convert from bytes to unicode based on the type of the provided arg.

We had to do this because there are domain names that youtube.com uses that are not valid under IDNA-2008 https://tools.ietf.org/html/rfc5891#section-4.2.3.1

...

I can see why a UI would need to do IDNA-2008 converts and validation but I'm not clear why its of value deep in the guts of twisted.

My guess is that this is just an accident of the way that the bytes/characters distinction and the IDNA features were added to Twisted, and is probably a bug.

...

Why is this code needed at all in twisted? If its for a high level API then why isn't it being called at the edge of the high level API calls?

...

The key idea here is that its human input that will be converted. But the code is used deep in the _sslverify.py where no human input is entered.

Glyph

5:52 a.m.

...

On Apr 27, 2021, at 8:58 PM, Wim Lewis <wiml@hhhh.org> wrote:

On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:

...
We just added a patch to our twisted to prevent twisted from doing idna validation. _idnaBytes and _idnaText not convert from bytes to unicode based on the type of the provided arg.

We had to do this because there are domain names that youtube.com uses that are not valid under IDNA-2008 https://tools.ietf.org/html/rfc5891#section-4.2.3.1

My reading of the RFC is that the YouTube domain you mention (r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that doesn't mean it's an entirely invaid domain label. It just means you can't legally run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The intent, as I understand it, is to forbid any possibility of double-encoding or double-decoding a label, not to forbid the possibility of using labels like the one you mention.

I agree with this reading.

...

...
I can see why a UI would need to do IDNA-2008 converts and validation but I'm not clear why its of value deep in the guts of twisted.

My guess is that this is just an accident of the way that the bytes/characters distinction and the IDNA features were added to Twisted, and is probably a bug.

...

...
Why is this code needed at all in twisted? If its for a high level API then why isn't it being called at the edge of the high level API calls?

I'd argue that resolving URLs is in fact a high level API (from the point of view of the name resoution system) but even so, it seems to me that Twisted is doing the wrong thing here. The format of that label should prevent it from ever being transformed by IDNA, but shouldn't prevent it from being passed through unchanged, since it doesn't contain any codepoints outside of the usual ASCII range.

Also agreed with all of this.

...

...
The key idea here is that its human input that will be converted. But the code is used deep in the _sslverify.py where no human input is entered.

_sslverify has to check whether the information in the server's certificate matches the URL that the user supplied. Certificates can contain Unicode text — at least in the (completely obsolete) CN-as-domain-name situation — so _sslverify probably picked up the requirement for IDNA transformations from that. (I don't remember whether dNSName SANs can contain unicode.)

Yep.

...

What is the patch you decided to add to your version? Where in _sslverify did the problem surface?

I am also very curious about this :).

Barry Scott

9:22 a.m.

On Wednesday, 28 April 2021 06:52:30 BST Glyph wrote:

...

...
On Apr 27, 2021, at 8:58 PM, Wim Lewis <wiml@hhhh.org> wrote:

On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:

...
We just added a patch to our twisted to prevent twisted from doing idna validation. _idnaBytes and _idnaText not convert from bytes to unicode based on the type of the provided arg.

We had to do this because there are domain names that youtube.com uses that are not valid under IDNA-2008 https://tools.ietf.org/html/rfc5891#section-4.2.3.1

My reading of the RFC is that the YouTube domain you mention (r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that doesn't mean it's an entirely invaid domain label. It just means you can't legally run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The intent, as I understand it, is to forbid any possibility of double-encoding or double-decoding a label, not to forbid the possibility of using labels like the one you mention.

I agree with this reading.

...
...
I can see why a UI would need to do IDNA-2008 converts and validation but I'm not clear why its of value deep in the guts of twisted.

My guess is that this is just an accident of the way that the bytes/characters distinction and the IDNA features were added to Twisted, and is probably a bug.

+1.

We also have other issues with the Python IDNA library: https://github.com/kjd/idna/issues/18 <https://github.com/kjd/idna/issues/18> and would generally like to reduce our strictness via whatever mechanisms we can, even for things that genuinely require it (which this does not).

...
...
Why is this code needed at all in twisted? If its for a high level API then why isn't it being called at the edge of the high level API calls?

I'd argue that resolving URLs is in fact a high level API (from the point of view of the name resoution system) but even so, it seems to me that Twisted is doing the wrong thing here. The format of that label should prevent it from ever being transformed by IDNA, but shouldn't prevent it from being passed through unchanged, since it doesn't contain any codepoints outside of the usual ASCII range.

Also agreed with all of this.

...
...
The key idea here is that its human input that will be converted. But the code is used deep in the _sslverify.py where no human input is entered.

_sslverify has to check whether the information in the server's certificate matches the URL that the user supplied. Certificates can contain Unicode text — at least in the (completely obsolete) CN-as-domain-name situation — so _sslverify probably picked up the requirement for IDNA transformations from that. (I don't remember whether dNSName SANs can contain unicode.)

Yep.

...
What is the patch you decided to add to your version? Where in _sslverify did the problem surface?

When _idaBytes was called to raise an exception in ClientTLSOptions.__init__.

...

I am also very curious about this :).

Attached is the patch we are using. We are using 19.07 for sad reasons. Barry

1388

Age (days ago)

1409

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

Barry Scott
Glyph
Wim Lewis

[Twisted-Python] IDNA problem in twisted

Barry Scott

Wim Lewis

Glyph

Barry Scott

Wim Lewis

Glyph

Barry Scott

tags

participants (3)