To be clear on that first point, that user agent  was in the Nginx logs, and Nginx is configured to pass all headers on to Devpi, so that should also be the user agent that Devpi gets. 

Also, I just re-confirmed that  if I do something as simple as `pip install -U internal_project`, I get that error logged on the master. Also, if I go to https://devpi.example.com/eventbrite/stable/+simple/another_internal_project/ in my browser, I get that error logged again (for the different project) on the master. So, for whatever reason, the user agent matching is definitely not working.

Thanks,

Nick


On Wed, Jan 8, 2020 at 6:08 AM Nicholas Williams <nicholas@nicholaswilliams.net> wrote:
I see a few options here, which aren't mutually exclusive:

- Fix the user  agent matching, because it appears broken somehow. The user agent header in the logs is as follows:

pip/19.3.1 {\x22ci\x22:null,\x22cpu\x22:\x22x86_64\x22,\x22distro\x22:{\x22name\x22:\x22macOS\x22,\x22version\x22:\x2210.13.6\x22},\x22implementation\x22:{\x22name\x22:\x22CPython\x22,\x22version\x22:\x223.7.4\x22},\x22installer\x22:{\x22name\x22:\x22pip\x22,\x22version\x22:\x2219.3.1\x22},\x22openssl_version\x22:\x22OpenSSL 1.0.2s  28 May 2019\x22,\x22python\x22:\x223.7.4\x22,\x22setuptools_version\x22:\x2244.0.0\x22,\x22system\x22:{\x22name\x22:\x22Darwin\x22,\x22release\x22:\x2217.7.0\x22}}

But Devpi is still exhibiting this behavior, so somehow it's not matching on that user agent header properly.

- Provide a setting to disable this behavior (a strong preference of mine), partly to prevent these errors, but also because (as you said) it exposes internal package names to the mirror provider.

- Provide a way to disable logging an error when a request is made to a project's simple index that does not exist. I would disable it on  the master (because I don't care if the mirrors make this request) but leave it enabled on  the mirrors (because such errors would indicate a client error).

On Wed, Jan 8, 2020 at 4:04 AM Florian Schulze <mail@florian-schulze.net> wrote:

Hi!

I dug a bit deeper.

This happens in devpi_server/views.py simple_list_project. We check the user agent header to do speed ups for installers like pip. When that doesn't match, additional info is added to the page. One of that is generated by get_mirror_whitelist_info. It has to check the existence of the project in the mirror to show whether the project blocks releases from the mirror or not. This is useful info, but triggers a request to the mirror, which is what you see.
You should only see this when going to the simple page in a browser, installers like pip shouldn't trigger it. We only check for these strings in user agent though: distribute setuptools pip pex, so if that doesn't match get_mirror_whitelist_info will be triggered.

I'm thinking about the implications of that, as it exposes internal package names to the mirror provider.

Regards,
Florian Schulze


On 3 Jan 2020, at 14:48, Nicholas Williams wrote:

That request had already rolled off, so I had to search for a new one.  Here's this:

$ kubectl --namespace=devpi-server logs devpi-server-primary-85d6484cb7-hgqpp  | grep req8603
2020-01-03 13:22:19,560    INFO: [req8603] GET /root/pypi/+simple/internal_project_name/
2020-01-03 13:22:19,631   ERROR: [req8603] [Rtx14627] while handling https://primary-devpi-server.example.com/root/pypi/+simple/internal_project_name/:

I'm trying to troubleshoot that further, so I grepped for the URI, which nothing should be requesting:

$ kubectl --namespace=devpi-server logs devpi-server-primary-85d6484cb7-hgqpp  | grep "/root/pypi/+simple/internal_project_name/"
2020-01-02 18:10:53,164    INFO: [req82] GET /root/pypi/+simple/internal_project_name/
2020-01-02 18:10:53,170   ERROR: [req82] [Rtx14604] while handling https://primary-devpi-server.example.com/root/pypi/+simple/internal_project_name/:
"172.27.23.69" (172.27.23.69) -  [02/Jan/2020:18:10:53 +0000] "GET /root/pypi/+simple/internal_project_name/ HTTP/1.1" 404 1037 "-" 0.013 "http/https/https" "primary-devpi-server.example.com/primary-devpi-server.example.com/primary-devpi-server.example.com" "80/443/443"

That IP address from which that request came is the IP address of one of the mirror servers. When I grep the mirror logs  for that internal project name, there is nothing indicating that it is looking in root/pypi; only org_name/stable:

$ kubectl --namespace=devpi-server logs devpi-server-replica-0 | grep internal_project_name
2020-01-02 18:10:53,150    INFO: [req37] GET /org_name/stable/+simple/internal_project_name/
"172.27.36.103" (172.27.36.103) -  [02/Jan/2020:18:10:53 +0000] "GET /org_name/stable/+simple/internal_project_name/ HTTP/1.1" 200 1878 "-" 0.101 "http/https/https" "devpi.example.com/devpi.example.com/devpi.example.com" "80/443/443"

So our builds (locally and on Jenkins) are requesting /org_name/stable/+simple/internal_project_name/ of the mirrors, but then the mirrors are requesting /root/pypi/+simple/internal_project_name/ of the master. But they must at some point be requesting the correct URL, because the mirrors have all of the proper projects in org_name/stable, and you can install them all just fine using the mirrors.

On the master, the indexes are configured as follows (org_name/stable extends org_name/upstream extends root/pypi):

$ devpi index -v org_name/stable
  type=stage
  bases=org_name/upstream
  volatile=False
  acl_upload=org_name
  acl_toxresult_upload=org_name
  mirror_whitelist=

$ devpi index -v org_name/upstream
  type=stage
  bases=root/pypi
  volatile=False
  acl_upload=org_name
  acl_toxresult_upload=org_name
  mirror_whitelist=*

The mirror/replica servers are configured like this:

devpi-server:
  serverdir: /data/devpi-server/server-data
  secretfile: /data/devpi-server/server-data/.secret
  unix-socket: /tmp/devpi-server.sock
  unix-socket-perms: "0666"
  threads: 16
  replica-max-retries: 2
  request-timeout: 5
  root-passwd: ******************
  theme: semantic-ui
  role: replica

Hope that additional information helps.

Thanks,

Nick

On Fri, Jan 3, 2020 at 3:32 AM Florian Schulze <mail@florian-schulze.net> wrote:
On 2 Jan 2020, at 18:19, Nicholas Williams wrote:

> We're starting to roll out our new Devpi installation to projects now
> that
> we've tested it extensively. We have one master and three mirrors. For
> the
> purposes of this email, the mirrors are load-balanced behind the
> domain
> name "devpi.example.com," and the master is available at "
> primary-devpi-server.example.com."
>
> We configured our Pipenv and Pip installations to use this index URL:
> https://devpi.example.com/org_name/stable/+simple/
>
> That URL will use the mirrors. The "org_name/stable" index extends
> root/pypi and also includes our internal projects. Everything appears
> to be
> working on the client. No errors, installs are going off without a
> hitch.
>
> However, the Devpi server on the master machine (not the mirrors) logs
> a
> ton of errors like the following:
>
> [req78237] [Rtx14602] while handling
> https://primary-devpi-server.example.com/root/pypi/+simple/internal_project_name/
> :
> The project internal_project_name does not exist.
>
> What is the cause of these errors? Is this something that should
> concern
> us, or something that we can ignore? "internal_project_name" installs
> just
> fine, but for some reason yields those errors on the master server.

Is there more log output with the same request number ("req78237" in
this case)? Without that it is hard to say. Looking at the code this
should only happen if root/pypi is accessed directly. What is the
setting of ``mirror_whitelist`` in "org_name/stable"? I tried with an
empty whitelist and with * and in both cases I wasn't able to provoke
the error for a private package.

Regards,
Florian Schulze