Dropping the pdf documentation.

Hi All, This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412>. Chuck

+1 let’s drop the PDF docs. They are already very hard to read. On Sun, May 22, 2022 at 1:06 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412>.
Chuck _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: shoyer@gmail.com

Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable. I personally use the PDF documentation both on my phone and e-reader when I travel simply because it is more accessible and has better search capabilities. It is true that SciPy has removed them, but that doesn't necessarily mean we need to follow suit. Especially relevant (IMO) is that large parts of the NumPy documentation still make sense when read sequentially (going back to when it was at some point partially kanged from Travis' book). I'd be happy to spend time (and plan to) working on fixing concrete issues other than straw-man and subjective arguments. Personally I'd like to see the NumPy documentation have PDFs in a fashion where each page / chapter can be downloaded individually. -- Rohit P.S.: If we have CI timeout issues, for the PDF docs we could also have a dedicated repo and only build for releases. P.P.S: FWIW the Python docs are also still distributed in PDF form. On 22 May 2022, at 21:41, Stephan Hoyer wrote:
+1 let’s drop the PDF docs. They are already very hard to read.
On Sun, May 22, 2022 at 1:06 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412> .
Chuck _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: shoyer@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: rgoswami@quansight.com

On Sun, May 22, 2022 at 3:52 PM Rohit Goswami <rgoswami@quansight.com> wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
The HTML docs can also be downloaded for offline use. Perhaps someone has access to analytics from numpy.org that can tell us how often the PDF docs are viewed? I believe PDFs could be more convenient for some use-cases, but I don't think it's worth the trouble of the separate rendering pipeline for a relatively niche use-case.

The HTML docs can also be downloaded for offline use.
HTML documentation isn't easy to navigate on anything other than a laptop / tablet. It also makes it confusing when there isn't an internet connection because of the external links. Also it is hard (subjectively) to read in order. Also more subjective points: - A PDF can be annotated - A PDF can be bookmarked - A PDF has a separate reading app / device from browsing - A PDF can be printed out (yes sometimes this might still be useful)
Perhaps someone has access to analytics from [numpy.org](http://numpy.org) that can tell us how often the PDF docs are viewed?
It would be interesting to see the comparison between the downloads of the offline HTML documentation and the PDF documentation.
I don't think it's worth the trouble of the separate rendering pipeline for a relatively niche use-case.
We might end up going that way, but it would be a shame IMO. Though splitting it out might be useful. Python documentation is packaged into chapters which are self contained and reasonably readable. We should strive for the same. As we move increasingly towards different kinds of content (blogs/notebooks/tutorials), I think we should try to keep all users in mind and PDF generation is hardly on its way out in any language / library. --- Rohit On 22 May 2022, at 23:28, Stephan Hoyer wrote:
On Sun, May 22, 2022 at 3:52 PM Rohit Goswami <rgoswami@quansight.com> wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
The HTML docs can also be downloaded for offline use.
Perhaps someone has access to analytics from numpy.org that can tell us how often the PDF docs are viewed? I believe PDFs could be more convenient for some use-cases, but I don't think it's worth the trouble of the separate rendering pipeline for a relatively niche use-case.
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: rgoswami@quansight.com

On Mon, May 23, 2022 at 1:31 AM Stephan Hoyer <shoyer@gmail.com> wrote:
On Sun, May 22, 2022 at 3:52 PM Rohit Goswami <rgoswami@quansight.com> wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
The HTML docs can also be downloaded for offline use.
Perhaps someone has access to analytics from numpy.org that can tell us how often the PDF docs are viewed? I believe PDFs could be more convenient for some use-cases, but I don't think it's worth the trouble of the separate rendering pipeline for a relatively niche use-case.
Unfortunately https://numpy.org/doc/ is a separate site from https://numpy.org/. The latter is built with Hugo and we have analytics for it, the former is built with Sphinx and we don't have analytics for it. Cheers, Ralf

On Sun, May 22, 2022 at 4:54 PM Rohit Goswami <rgoswami@quansight.com> wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
I personally use the PDF documentation both on my phone and e-reader when I travel simply because it is more accessible and has better search capabilities.
It is true that SciPy has removed them, but that doesn't necessarily mean we need to follow suit. Especially relevant (IMO) is that large parts of the NumPy documentation still make sense when read sequentially (going back to when it was at some point partially kanged from Travis' book).
I'd be happy to spend time (and plan to) working on fixing concrete issues other than straw-man and subjective arguments.
Personally I'd like to see the NumPy documentation have PDFs in a fashion where each page / chapter can be downloaded individually.
-- Rohit
P.S.: If we have CI timeout issues, for the PDF docs we could also have a dedicated repo and only build for releases.
P.P.S: FWIW the Python docs are also still distributed in PDF form.
If they were just hard to read, I'd be happy to distribute them. The problem is that they are hard to generate. Latex is limited, and we depend on Sphinx to generate it. When it breaks, as it does, it is also hard to debug because the error messages are cryptic, and at best refer to the generated latex code, which doesn't help track down the problem. I think it would be worth exploring html -> pdf converters, they might be better supported. <snip> Chuck

On 23/5/22 01:51, Rohit Goswami wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
I personally use the PDF documentation both on my phone and e-reader when I travel simply because it is more accessible and has better search capabilities.
It is true that SciPy has removed them, but that doesn't necessarily mean we need to follow suit. Especially relevant (IMO) is that large parts of the NumPy documentation still make sense when read sequentially (going back to when it was at some point partially kanged from Travis' book).
I'd be happy to spend time (and plan to) working on fixing concrete issues other than straw-man and subjective arguments.
Personally I'd like to see the NumPy documentation have PDFs in a fashion where each page / chapter can be downloaded individually.
-- Rohit
P.S.: If we have CI timeout issues, for the PDF docs we could also have a dedicated repo and only build for releases.
P.P.S: FWIW the Python docs are also still distributed in PDF form.
On 22 May 2022, at 21:41, Stephan Hoyer wrote:
+1 let’s drop the PDF docs. They are already very hard to read.
On Sun, May 22, 2022 at 1:06 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412>.
Chuck
Thanks Rohit for the offer to take on this project. I don't think we should block the release on the existence of PDF documentation. It is a "nice to have", not a hard requirement. One strategy to discover problems with the PDF builds in CI would be to add a weekly build of PDF. Matti

On Mon, May 23, 2022 at 6:51 AM Matti Picus <matti.picus@gmail.com> wrote:
On 23/5/22 01:51, Rohit Goswami wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
I personally use the PDF documentation both on my phone and e-reader when I travel simply because it is more accessible and has better search capabilities.
It is true that SciPy has removed them, but that doesn't necessarily mean we need to follow suit. Especially relevant (IMO) is that large parts of the NumPy documentation still make sense when read sequentially (going back to when it was at some point partially kanged from Travis' book).
I'd be happy to spend time (and plan to) working on fixing concrete issues other than straw-man and subjective arguments.
Personally I'd like to see the NumPy documentation have PDFs in a fashion where each page / chapter can be downloaded individually.
-- Rohit
P.S.: If we have CI timeout issues, for the PDF docs we could also have a dedicated repo and only build for releases.
P.P.S: FWIW the Python docs are also still distributed in PDF form.
On 22 May 2022, at 21:41, Stephan Hoyer wrote:
+1 let’s drop the PDF docs. They are already very hard to read.
On Sun, May 22, 2022 at 1:06 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <
https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412>.
Chuck
Thanks Rohit for the offer to take on this project.
I don't think we should block the release on the existence of PDF documentation. It is a "nice to have", not a hard requirement.
One strategy to discover problems with the PDF builds in CI would be to add a weekly build of PDF.
That would just mean more CI maintenance/breakage, that the same folks who always take care of CI issues inevitably are going to have to look at. I'm +1 for removing pdf builds, they are not worth the maintainer effort - we shouldn't put them in CI, and they break at release time too often. It will remain possible for interested users to rebuild the docs themselves - and we can/will accept patches for docstring issues that trip up the pdf but not the html build. That's the same support level we have for other things that we do not run in CI. When we removed the SciPy pdf docs, the one concern was that there was no longer an offline option (by Juan, a very knowledgeable user and occasional contributor). So I suspect that most of the pdf downloads are for users who want that offline option, but we don't tell them that html+zip is the preferred one. Another benefit of removal is to slim down our dev Docker images a lot - right now the numpy-dev image is 300 MB larger than the scipy-dev one because of the inclusion of TeX Live. Cheers, Ralf

What do you guys think of the chm format ("windows help")? This offline documentation format is shipped with all python releases (eg https://www.python.org/downloads/release/python-3913/). It is simple to build from a hierarchy of html files, it is downloadable, searchable, bookmarkable, has index, supports hyperlinks, can be opened on linux as well. One downside of it is that recent Windows versions (=Windows 10) block the "execution" of this file if downloaded from "untrusted source" (=internet), so it needs a checkbox in file properties to lift this "security block". Afaik, NumPy used to ship docs in this format many years ago, but then dropped its support. Best regards, Lev On Mon, May 23, 2022 at 1:33 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, May 23, 2022 at 6:51 AM Matti Picus <matti.picus@gmail.com> wrote:
On 23/5/22 01:51, Rohit Goswami wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
I personally use the PDF documentation both on my phone and e-reader when I travel simply because it is more accessible and has better search capabilities.
It is true that SciPy has removed them, but that doesn't necessarily mean we need to follow suit. Especially relevant (IMO) is that large parts of the NumPy documentation still make sense when read sequentially (going back to when it was at some point partially kanged from Travis' book).
I'd be happy to spend time (and plan to) working on fixing concrete issues other than straw-man and subjective arguments.
Personally I'd like to see the NumPy documentation have PDFs in a fashion where each page / chapter can be downloaded individually.
-- Rohit
P.S.: If we have CI timeout issues, for the PDF docs we could also have a dedicated repo and only build for releases.
P.P.S: FWIW the Python docs are also still distributed in PDF form.
On 22 May 2022, at 21:41, Stephan Hoyer wrote:
+1 let’s drop the PDF docs. They are already very hard to read.
On Sun, May 22, 2022 at 1:06 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <
https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412>.
Chuck
Thanks Rohit for the offer to take on this project.
I don't think we should block the release on the existence of PDF documentation. It is a "nice to have", not a hard requirement.
One strategy to discover problems with the PDF builds in CI would be to add a weekly build of PDF.
That would just mean more CI maintenance/breakage, that the same folks who always take care of CI issues inevitably are going to have to look at.
I'm +1 for removing pdf builds, they are not worth the maintainer effort - we shouldn't put them in CI, and they break at release time too often. It will remain possible for interested users to rebuild the docs themselves - and we can/will accept patches for docstring issues that trip up the pdf but not the html build. That's the same support level we have for other things that we do not run in CI.
When we removed the SciPy pdf docs, the one concern was that there was no longer an offline option (by Juan, a very knowledgeable user and occasional contributor). So I suspect that most of the pdf downloads are for users who want that offline option, but we don't tell them that html+zip is the preferred one.
Another benefit of removal is to slim down our dev Docker images a lot - right now the numpy-dev image is 300 MB larger than the scipy-dev one because of the inclusion of TeX Live.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: lev.maximov@gmail.com

On Mon, May 23, 2022 at 10:21 AM Lev Maximov <lev.maximov@gmail.com> wrote:
What do you guys think of the chm format ("windows help")? This offline documentation format is shipped with all python releases (eg https://www.python.org/downloads/release/python-3913/). It is simple to build from a hierarchy of html files, it is downloadable, searchable, bookmarkable, has index, supports hyperlinks, can be opened on linux as well.
One downside of it is that recent Windows versions (=Windows 10) block the "execution" of this file if downloaded from "untrusted source" (=internet), so it needs a checkbox in file properties to lift this "security block".
Afaik, NumPy used to ship docs in this format many years ago, but then dropped its support.
Indeed. It's much more niche than pdf, so I'd prefer to not consider it. You can easily build it locally though if you'd use it personally.
Best regards, Lev
On Mon, May 23, 2022 at 1:33 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, May 23, 2022 at 6:51 AM Matti Picus <matti.picus@gmail.com> wrote:
On 23/5/22 01:51, Rohit Goswami wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
I personally use the PDF documentation both on my phone and e-reader when I travel simply because it is more accessible and has better search capabilities.
It is true that SciPy has removed them, but that doesn't necessarily mean we need to follow suit. Especially relevant (IMO) is that large parts of the NumPy documentation still make sense when read sequentially (going back to when it was at some point partially kanged from Travis' book).
I'd be happy to spend time (and plan to) working on fixing concrete issues other than straw-man and subjective arguments.
Personally I'd like to see the NumPy documentation have PDFs in a fashion where each page / chapter can be downloaded individually.
-- Rohit
P.S.: If we have CI timeout issues, for the PDF docs we could also have a dedicated repo and only build for releases.
P.P.S: FWIW the Python docs are also still distributed in PDF form.
On 22 May 2022, at 21:41, Stephan Hoyer wrote:
+1 let’s drop the PDF docs. They are already very hard to read.
On Sun, May 22, 2022 at 1:06 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <
https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412>.
Chuck
Thanks Rohit for the offer to take on this project.
I don't think we should block the release on the existence of PDF documentation. It is a "nice to have", not a hard requirement.
One strategy to discover problems with the PDF builds in CI would be to add a weekly build of PDF.
That would just mean more CI maintenance/breakage, that the same folks who always take care of CI issues inevitably are going to have to look at.
I'm +1 for removing pdf builds, they are not worth the maintainer effort - we shouldn't put them in CI, and they break at release time too often. It will remain possible for interested users to rebuild the docs themselves - and we can/will accept patches for docstring issues that trip up the pdf but not the html build. That's the same support level we have for other things that we do not run in CI.
When we removed the SciPy pdf docs, the one concern was that there was no longer an offline option (by Juan, a very knowledgeable user and occasional contributor). So I suspect that most of the pdf downloads are for users who want that offline option, but we don't tell them that html+zip is the preferred one.
Another benefit of removal is to slim down our dev Docker images a lot - right now the numpy-dev image is 300 MB larger than the scipy-dev one because of the inclusion of TeX Live.
I'm not so interested in the detailed discussion later on in this thread to be honest. Let me propose a simple solution that should make everyone happy: 1. We drop pdf builds in CI, the release process and the Docker image, but keep support in the code base. 2. Rohit volunteered to maintain the pdf build, so if he (or another person we know and trust to receive artifacts from and distribute them) wants to send PRs to fix doc build issues and merge a pdf build into https://github.com/numpy/doc/, we'll review and merge those. This keeps the pdf docs available for as long as someone does the work, while removing the burden from the release manager and general development. This seems like a decent compromise, similar to what we do for other things with a fairly niche audience. Cheers, Ralf

Furthermore, the PDF docs of numpy (and maybe scipy) can be stripped to a separate project and put on a separate release cycle, not necessarily tracking the releases. On Mon, May 23, 2022 at 10:37 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, May 23, 2022 at 10:21 AM Lev Maximov <lev.maximov@gmail.com> wrote:
What do you guys think of the chm format ("windows help")? This offline documentation format is shipped with all python releases (eg https://www.python.org/downloads/release/python-3913/). It is simple to build from a hierarchy of html files, it is downloadable, searchable, bookmarkable, has index, supports hyperlinks, can be opened on linux as well.
One downside of it is that recent Windows versions (=Windows 10) block the "execution" of this file if downloaded from "untrusted source" (=internet), so it needs a checkbox in file properties to lift this "security block".
Afaik, NumPy used to ship docs in this format many years ago, but then dropped its support.
Indeed. It's much more niche than pdf, so I'd prefer to not consider it. You can easily build it locally though if you'd use it personally.
Best regards, Lev
On Mon, May 23, 2022 at 1:33 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, May 23, 2022 at 6:51 AM Matti Picus <matti.picus@gmail.com> wrote:
On 23/5/22 01:51, Rohit Goswami wrote:
Being very hard to read should not be reason enough to stop
generating
them. In places with little to no internet connectivity often the PDF documentation is invaluable.
I personally use the PDF documentation both on my phone and e-reader when I travel simply because it is more accessible and has better search capabilities.
It is true that SciPy has removed them, but that doesn't necessarily mean we need to follow suit. Especially relevant (IMO) is that large parts of the NumPy documentation still make sense when read sequentially (going back to when it was at some point partially kanged from Travis' book).
I'd be happy to spend time (and plan to) working on fixing concrete issues other than straw-man and subjective arguments.
Personally I'd like to see the NumPy documentation have PDFs in a fashion where each page / chapter can be downloaded individually.
-- Rohit
P.S.: If we have CI timeout issues, for the PDF docs we could also have a dedicated repo and only build for releases.
P.P.S: FWIW the Python docs are also still distributed in PDF form.
On 22 May 2022, at 21:41, Stephan Hoyer wrote:
+1 let’s drop the PDF docs. They are already very hard to read.
On Sun, May 22, 2022 at 1:06 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here < https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412>.
Chuck
Thanks Rohit for the offer to take on this project.
I don't think we should block the release on the existence of PDF documentation. It is a "nice to have", not a hard requirement.
One strategy to discover problems with the PDF builds in CI would be to add a weekly build of PDF.
That would just mean more CI maintenance/breakage, that the same folks who always take care of CI issues inevitably are going to have to look at.
I'm +1 for removing pdf builds, they are not worth the maintainer effort - we shouldn't put them in CI, and they break at release time too often. It will remain possible for interested users to rebuild the docs themselves - and we can/will accept patches for docstring issues that trip up the pdf but not the html build. That's the same support level we have for other things that we do not run in CI.
When we removed the SciPy pdf docs, the one concern was that there was no longer an offline option (by Juan, a very knowledgeable user and occasional contributor). So I suspect that most of the pdf downloads are for users who want that offline option, but we don't tell them that html+zip is the preferred one.
Another benefit of removal is to slim down our dev Docker images a lot - right now the numpy-dev image is 300 MB larger than the scipy-dev one because of the inclusion of TeX Live.
I'm not so interested in the detailed discussion later on in this thread to be honest. Let me propose a simple solution that should make everyone happy: 1. We drop pdf builds in CI, the release process and the Docker image, but keep support in the code base. 2. Rohit volunteered to maintain the pdf build, so if he (or another person we know and trust to receive artifacts from and distribute them) wants to send PRs to fix doc build issues and merge a pdf build into https://github.com/numpy/doc/, we'll review and merge those.
This keeps the pdf docs available for as long as someone does the work, while removing the burden from the release manager and general development. This seems like a decent compromise, similar to what we do for other things with a fairly niche audience.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: rainwoodman@gmail.com

On Mon, May 23, 2022, at 10:34, Ralf Gommers wrote:
I'm not so interested in the detailed discussion later on in this thread to be honest. Let me propose a simple solution that should make everyone happy: 1. We drop pdf builds in CI, the release process and the Docker image, but keep support in the code base. 2. Rohit volunteered to maintain the pdf build, so if he (or another person we know and trust to receive artifacts from and distribute them) wants to send PRs to fix doc build issues and merge a pdf build into https://github.com/numpy/doc/, we'll review and merge those.
I second this proposal. Maybe with the provision that if, in a year from now, the PDF build is in a broken / unmaintained state, we remove it completely. Stéfan

Hi all! Happy to say this: *1. We drop pdf builds in CI, the release process and the Docker image, but keep support in the code base.* *2. Rohit volunteered to maintain the pdf build, so if he (or another person we know and trust to receive artifacts from and distribute them) wants to send PRs to fix doc build issues and merge a pdf build into https://github.com/numpy/doc/ <https://github.com/numpy/doc/>, we'll review and merge those.* is exactly the proposal that came up from the documentation team meeting today :) We also proposed circling back in the next docs team meeting in two weeks if there are any hiccups, but there was consensus at the meeting that this is a good compromise. - Melissa On Mon, May 23, 2022 at 4:33 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Mon, May 23, 2022, at 10:34, Ralf Gommers wrote:
I'm not so interested in the detailed discussion later on in this thread to be honest. Let me propose a simple solution that should make everyone happy: 1. We drop pdf builds in CI, the release process and the Docker image, but keep support in the code base. 2. Rohit volunteered to maintain the pdf build, so if he (or another person we know and trust to receive artifacts from and distribute them) wants to send PRs to fix doc build issues and merge a pdf build into https://github.com/numpy/doc/, we'll review and merge those.
I second this proposal. Maybe with the provision that if, in a year from now, the PDF build is in a broken / unmaintained state, we remove it completely.
Stéfan
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: melissawm@gmail.com

As the person initiated the PDF drop in SciPy, I'd give my reasoning for why it bugged me in the first place - The typography is \subsubpar (as a TeX person should say) and just an eyesore, this actually matters a lot more than you would assume and unreadable in mobile without constant zooming because of nonresponsive format - Almost all links are broken and left as double backticks since it is not originally designed for PDF navigation - Code copy/pasting is broken (due to how the TeX package for listings setup) regardless of the PDF viewer - It is mostly empty space hence bloats the page number because it comes from the HTML format and not the other way around as say, TeX4ht workflow would follow. - It is an absolute waste of resources on CI/CD since it fires up per Pull Request (maybe we can argue to reduce it to per main-branch-merge but doesn't change the fact that it is just wasteful and burdensome) - Like Ralf mentioned the infrastructure for a TeX run is unacceptable for today's standards (but it is the LaTeX maintainers to blame for it and I know some of them, they know this very well and trying hard to reduce it) - It is a very unstable workflow and errors out depending on the planets alignment because, again, it is coming from an awkward Markdown source which is not designed for. Becomes very annoying for maintainers to see it fail for otherwise a perfectly valid code. The API reference PDF (7.2 mb) is also difficult to find compared to the front page version which is the User guide (3.x mb). So probably there is no demand for it anyways because it didn't cause too much noise as far as I know. On Mon, May 23, 2022 at 8:34 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, May 23, 2022 at 6:51 AM Matti Picus <matti.picus@gmail.com> wrote:
On 23/5/22 01:51, Rohit Goswami wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
I personally use the PDF documentation both on my phone and e-reader when I travel simply because it is more accessible and has better search capabilities.
It is true that SciPy has removed them, but that doesn't necessarily mean we need to follow suit. Especially relevant (IMO) is that large parts of the NumPy documentation still make sense when read sequentially (going back to when it was at some point partially kanged from Travis' book).
I'd be happy to spend time (and plan to) working on fixing concrete issues other than straw-man and subjective arguments.
Personally I'd like to see the NumPy documentation have PDFs in a fashion where each page / chapter can be downloaded individually.
-- Rohit
P.S.: If we have CI timeout issues, for the PDF docs we could also have a dedicated repo and only build for releases.
P.P.S: FWIW the Python docs are also still distributed in PDF form.
On 22 May 2022, at 21:41, Stephan Hoyer wrote:
+1 let’s drop the PDF docs. They are already very hard to read.
On Sun, May 22, 2022 at 1:06 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <
https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412>.
Chuck
Thanks Rohit for the offer to take on this project.
I don't think we should block the release on the existence of PDF documentation. It is a "nice to have", not a hard requirement.
One strategy to discover problems with the PDF builds in CI would be to add a weekly build of PDF.
That would just mean more CI maintenance/breakage, that the same folks who always take care of CI issues inevitably are going to have to look at.
I'm +1 for removing pdf builds, they are not worth the maintainer effort - we shouldn't put them in CI, and they break at release time too often. It will remain possible for interested users to rebuild the docs themselves - and we can/will accept patches for docstring issues that trip up the pdf but not the html build. That's the same support level we have for other things that we do not run in CI.
When we removed the SciPy pdf docs, the one concern was that there was no longer an offline option (by Juan, a very knowledgeable user and occasional contributor). So I suspect that most of the pdf downloads are for users who want that offline option, but we don't tell them that html+zip is the preferred one.
Another benefit of removal is to slim down our dev Docker images a lot - right now the numpy-dev image is 300 MB larger than the scipy-dev one because of the inclusion of TeX Live.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ilhanpolat@gmail.com

I am unaware of the state of the SciPy documentation at the time it was dropped. However, many of these arguments do not seem to apply to the NumPy documentation hosted at https://numpy.org/doc/.
The typography is \\subsubpar (as a TeX person should say) and just an eyesore, this actually matters a lot more than you would assume and unreadable in mobile without constant zooming because of nonresponsive format
This is a valid concern, but there are third-party tools to deal with reflow (both at the mobile level and by preprocessing like with k2pdfopt: https://www.willus.com/k2pdfopt/)
Almost all links are broken and left as double backticks since it is not originally designed for PDF navigation
There are no broken links in our User Guide, and even external links (e.g. `PyObject` links to the Python documentation) work. Internal links to different parts of the PDF also work. I have not read our Reference Guide cover to cover in a while (other than the NumPy-C API chapter) but I do not remember any backticks anywhere. Please correct me if this is incorrect.
Code copy/pasting is broken (due to how the TeX package for listings setup) regardless of the PDF viewer
This isn't the case for Firefox's PDF viewer and others I have tried (Adobe, Zathura). Though on Linux most pdf copy-pastes can be a little difficult.
It is mostly empty space hence bloats the page number because it comes from the HTML format and not the other way around as say, TeX4ht workflow would follow.
Untrue, our typography and layout might not be perfect but we do not have a lot of empty space.
It is an absolute waste of resources on CI/CD since it fires up per Pull Request (maybe we can argue to reduce it to per main-branch-merge but doesn't change the fact that it is just wasteful and burdensome)
We have a reasonable 30 minute timeout for the `pdf` build and we have discussed running this less frequently.
Like Ralf mentioned the infrastructure for a TeX run is unacceptable for today's standards (but it is the LaTeX maintainers to blame for it and I know some of them, they know this very well and trying hard to reduce it)
Also can be mitigated, we can shift to `tectonic` or simply use a custom `texlive` install to have less packages (for the size issue).
It is a very unstable workflow and errors out depending on the planets alignment because, again, it is coming from an awkward Markdown source which is not designed for. Becomes very annoying for maintainers to see it fail for otherwise a perfectly valid code.
We don't have markdown sources? I understand that perhaps SciPy's documentation was in far worse shape than NumPy, but we shouldn't paint with a broad brush. -- Rohit On 23 May 2022, at 8:41, Ilhan Polat wrote:
As the person initiated the PDF drop in SciPy, I'd give my reasoning for why it bugged me in the first place
- The typography is \subsubpar (as a TeX person should say) and just an eyesore, this actually matters a lot more than you would assume and unreadable in mobile without constant zooming because of nonresponsive format - Almost all links are broken and left as double backticks since it is not originally designed for PDF navigation - Code copy/pasting is broken (due to how the TeX package for listings setup) regardless of the PDF viewer - It is mostly empty space hence bloats the page number because it comes from the HTML format and not the other way around as say, TeX4ht workflow would follow. - It is an absolute waste of resources on CI/CD since it fires up per Pull Request (maybe we can argue to reduce it to per main-branch-merge but doesn't change the fact that it is just wasteful and burdensome) - Like Ralf mentioned the infrastructure for a TeX run is unacceptable for today's standards (but it is the LaTeX maintainers to blame for it and I know some of them, they know this very well and trying hard to reduce it) - It is a very unstable workflow and errors out depending on the planets alignment because, again, it is coming from an awkward Markdown source which is not designed for. Becomes very annoying for maintainers to see it fail for otherwise a perfectly valid code.
The API reference PDF (7.2 mb) is also difficult to find compared to the front page version which is the User guide (3.x mb). So probably there is no demand for it anyways because it didn't cause too much noise as far as I know.
On Mon, May 23, 2022 at 8:34 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, May 23, 2022 at 6:51 AM Matti Picus <matti.picus@gmail.com> wrote:
On 23/5/22 01:51, Rohit Goswami wrote:
Being very hard to read should not be reason enough to stop generating them. In places with little to no internet connectivity often the PDF documentation is invaluable.
I personally use the PDF documentation both on my phone and e-reader when I travel simply because it is more accessible and has better search capabilities.
It is true that SciPy has removed them, but that doesn't necessarily mean we need to follow suit. Especially relevant (IMO) is that large parts of the NumPy documentation still make sense when read sequentially (going back to when it was at some point partially kanged from Travis' book).
I'd be happy to spend time (and plan to) working on fixing concrete issues other than straw-man and subjective arguments.
Personally I'd like to see the NumPy documentation have PDFs in a fashion where each page / chapter can be downloaded individually.
-- Rohit
P.S.: If we have CI timeout issues, for the PDF docs we could also have a dedicated repo and only build for releases.
P.P.S: FWIW the Python docs are also still distributed in PDF form.
On 22 May 2022, at 21:41, Stephan Hoyer wrote:
+1 let’s drop the PDF docs. They are already very hard to read.
On Sun, May 22, 2022 at 1:06 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <
https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412> .
Chuck
Thanks Rohit for the offer to take on this project.
I don't think we should block the release on the existence of PDF documentation. It is a "nice to have", not a hard requirement.
One strategy to discover problems with the PDF builds in CI would be to add a weekly build of PDF.
That would just mean more CI maintenance/breakage, that the same folks who always take care of CI issues inevitably are going to have to look at.
I'm +1 for removing pdf builds, they are not worth the maintainer effort - we shouldn't put them in CI, and they break at release time too often. It will remain possible for interested users to rebuild the docs themselves - and we can/will accept patches for docstring issues that trip up the pdf but not the html build. That's the same support level we have for other things that we do not run in CI.
When we removed the SciPy pdf docs, the one concern was that there was no longer an offline option (by Juan, a very knowledgeable user and occasional contributor). So I suspect that most of the pdf downloads are for users who want that offline option, but we don't tell them that html+zip is the preferred one.
Another benefit of removal is to slim down our dev Docker images a lot - right now the numpy-dev image is 300 MB larger than the scipy-dev one because of the inclusion of TeX Live.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ilhanpolat@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: rgoswami@quansight.com

On Mon, May 23, 2022 at 11:12 AM Rohit Goswami <rgoswami@quansight.com> wrote:
I am unaware of the state of the SciPy documentation at the time it was dropped. However, many of these arguments do not seem to apply to the NumPy documentation hosted at https://numpy.org/doc/.
They were almost identical, same machinery (like most things). Well this is subjective but the typography is unfit for any code based format.
The typography is \\subsubpar (as a TeX person should say) and just an eyesore, this actually matters a lot more than you would assume and unreadable in mobile without constant zooming because of nonresponsive format
This is a valid concern, but there are third-party tools to deal with reflow (both at the mobile level and by preprocessing like with k2pdfopt: https://www.willus.com/k2pdfopt/)
The contents are nonresponsive. No tool can fix a native responsiveness issues. I am familiar with those tools. The questions is why work so hard when you have the HTML already?
There are no broken links in our User Guide, and even external links (e.g. PyObject links to the Python documentation) work. Internal links to different parts of the PDF also work.
I have not read our Reference Guide cover to cover in a while (other than the NumPy-C API chapter) but I do not remember any backticks anywhere. Please correct me if this is incorrect.
OK this one was on me. I've updated the reader and now things went back to normal. Sorry for the noise.
This isn't the case for Firefox's PDF viewer and others I have tried (Adobe, Zathura). Though on Linux most pdf copy-pastes can be a little difficult.
All mentioned viewers fail to retain the format of the code and copy as text. You should paste it to somewhere to get the problem. Unless it is single line in the examples the rest is not really working properly. In case you are not familiar there are quite a number of ways to fix this but definitely not worth the effort.
Untrue, our typography and layout might not be perfect but we do not have a lot of empty space.
This is demonstrably true, the document margins are against quite a few technical document typography rules (mostly how page setup and typeface choices are done). It is as it is because the documentation is also following an indentation format very much like Python code which uses whitespace too generously. The document itself is quite an eyesore if you care about those things.np.einsum is a prime example how it shouldn't render in a PDF document. All choices are coming from the indentation of markdown. Because it uses none of the advantages of a PDF. That is typography and font layout. It cannot use it because the source is not providing context. Because it is coming from a function signature. This is also related to Markdown comment below.
We have a reasonable 30 minute timeout for the pdf build and we have discussed running this less frequently.
This doesn't change the fact that you are downloading way too many complex tools and moving images that are bloated. Just because it is free does not justify its use. It is just a huge waste to repeat that excessive compilation each and every time. I would also say it is a bit on the irresponsible side.
Also can be mitigated, we can shift to tectonic or simply use a custom texlive install to have less packages (for the size issue).
No. This is still a very large payload mainly due to the typography tools are used and their dependencies. Maintaining a custom TeXLive is just asking for trouble since the packages are updated very frequently (I know because we tried this many times at work to keep a mobile Receipt generator).
It is a very unstable workflow and errors out depending on the planets alignment because, again, it is coming from an awkward Markdown source which is not designed for. Becomes very annoying for maintainers to see it fail for otherwise a perfectly valid code.
We don't have markdown sources?
What I mean is that LaTeX source is text-based with context in it. But we are providing markdown sources. This causes problems in the meantime both in translation and also layout.
I understand that perhaps SciPy's documentation was in far worse shape than NumPy, but we shouldn't paint with a broad brush
That's not true. They are almost identical. These are common issues that still exists in the NumPy version. To be honest, it is very hard to make a case for PDF in its given condition. You can still compile and use it. We shouldn't continue bothering with it at the CI level just because there is a marginal interest in it. I am not ranting about NumPy because SciPy. This is a very bad TeX design and to fix it we have to get away from auto-doc generation which I am sure none of us want for now. That is unfortunately how good docs are now today, mathworks constantly being praised about it despite its notoriety. Hence I don't see any case for keeping generating this PDF. If you want to have a proper doc effort, that is a whole another story and I would love to have that but this doc generated PDF is not worth of any nonnegligable value. And if you think it is then you can generate it yourself. The tools are not going to be removed anyways.

The contents are nonresponsive. No tool can fix a native responsiveness issues. I am familiar with those tools. The questions is why work so hard when you have the HTML already?
I'm afraid I don't understand this argument. It is true that PDFs are not responsive without software assistance, but HTML documents when printed (e.g. CTRL+P) do not have any way of generating the Appendix / outlinks to related sections etc. Yet we still have HTML documentation. IMO this simply means they are not mutually exclusive.
This is demonstrably true, the document margins are against quite a few technical document typography rules (mostly how page setup and typeface choices are done). It is as it is because the documentation is also following an indentation format very much like Python code which uses whitespace too generously. The document itself is quite an eyesore if you care about those things.np.einsum is a prime example how it shouldn't render in a PDF document. All choices are coming from the indentation of markdown. Because it uses none of the advantages of a PDF. That is typography and font layout. It cannot use it because the source is not providing context. Because it is coming from a function signature. This is also related to Markdown comment below.
About this, the full page layout on the HTML pages has exactly the same amount of whitespace. It can be argued that for a full width layout there is exactly the same whitespace and indentation. Additionally, even trying to print out say, `np.einsum` will first have 3 pages of the sidebar when using a naive CTRL+P approach. The argument that the typography is poor goes beyond the documentation format. In fact, even the "responsiveness" is rather overrated at the moment. With a mobile device again the first few screenfulls are simply the sidebar with routines and other things. After which there's still whitespace, and things are still just as indented as in the PDF. Only now I also don't have a global TOC which is easy to see. The assertion that there is parity between serving ~2000 pages of documentation as HTML and ZIP files as opposed to a PDF seems to be flawed from the get go.
If you want to have a proper doc effort, that is a whole another story and I would love to have that but this doc generated PDF is not worth of any nonnegligable value.
I should add that NumPy does indeed have a dedicated docs team and consolidated effort. As mentioned earlier we meet regularly about these issues and it would be nice if the meetings are not unequivocally sidestepped by the mailing list. We also apply for funding (GSoD / NumFocus SDG) for our docs. I understand there are frustrations with the PDF, but I am still not convinced at this point that the HTML versions are even at par with the PDF experience. It is nice that I have the time and ability to generate my documentation locally for my niche needs should I so wish it. It is less nice that we assume that it must be niche and everyone would have the same energy because HTML is theoretically more responsive, even though our docs are not. -- Rohit On 23 May 2022, at 11:08, Ilhan Polat wrote:
On Mon, May 23, 2022 at 11:12 AM Rohit Goswami <rgoswami@quansight.com> wrote:
I am unaware of the state of the SciPy documentation at the time it was dropped. However, many of these arguments do not seem to apply to the NumPy documentation hosted at https://numpy.org/doc/.
They were almost identical, same machinery (like most things). Well this is subjective but the typography is unfit for any code based format.
The typography is \\subsubpar (as a TeX person should say) and just an eyesore, this actually matters a lot more than you would assume and unreadable in mobile without constant zooming because of nonresponsive format
This is a valid concern, but there are third-party tools to deal with reflow (both at the mobile level and by preprocessing like with k2pdfopt: https://www.willus.com/k2pdfopt/)
The contents are nonresponsive. No tool can fix a native responsiveness issues. I am familiar with those tools. The questions is why work so hard when you have the HTML already?
There are no broken links in our User Guide, and even external links (e.g. PyObject links to the Python documentation) work. Internal links to different parts of the PDF also work.
I have not read our Reference Guide cover to cover in a while (other than the NumPy-C API chapter) but I do not remember any backticks anywhere. Please correct me if this is incorrect.
OK this one was on me. I've updated the reader and now things went back to normal. Sorry for the noise.
This isn't the case for Firefox's PDF viewer and others I have tried (Adobe, Zathura). Though on Linux most pdf copy-pastes can be a little difficult.
All mentioned viewers fail to retain the format of the code and copy as text. You should paste it to somewhere to get the problem. Unless it is single line in the examples the rest is not really working properly. In case you are not familiar there are quite a number of ways to fix this but definitely not worth the effort.
Untrue, our typography and layout might not be perfect but we do not have a lot of empty space.
This is demonstrably true, the document margins are against quite a few technical document typography rules (mostly how page setup and typeface choices are done). It is as it is because the documentation is also following an indentation format very much like Python code which uses whitespace too generously. The document itself is quite an eyesore if you care about those things.np.einsum is a prime example how it shouldn't render in a PDF document. All choices are coming from the indentation of markdown. Because it uses none of the advantages of a PDF. That is typography and font layout. It cannot use it because the source is not providing context. Because it is coming from a function signature. This is also related to Markdown comment below.
We have a reasonable 30 minute timeout for the pdf build and we have discussed running this less frequently.
This doesn't change the fact that you are downloading way too many complex tools and moving images that are bloated. Just because it is free does not justify its use. It is just a huge waste to repeat that excessive compilation each and every time. I would also say it is a bit on the irresponsible side.
Also can be mitigated, we can shift to tectonic or simply use a custom texlive install to have less packages (for the size issue).
No. This is still a very large payload mainly due to the typography tools are used and their dependencies. Maintaining a custom TeXLive is just asking for trouble since the packages are updated very frequently (I know because we tried this many times at work to keep a mobile Receipt generator).
It is a very unstable workflow and errors out depending on the planets alignment because, again, it is coming from an awkward Markdown source which is not designed for. Becomes very annoying for maintainers to see it fail for otherwise a perfectly valid code.
We don't have markdown sources?
What I mean is that LaTeX source is text-based with context in it. But we are providing markdown sources. This causes problems in the meantime both in translation and also layout.
I understand that perhaps SciPy's documentation was in far worse shape than NumPy, but we shouldn't paint with a broad brush
That's not true. They are almost identical. These are common issues that still exists in the NumPy version. To be honest, it is very hard to make a case for PDF in its given condition. You can still compile and use it. We shouldn't continue bothering with it at the CI level just because there is a marginal interest in it. I am not ranting about NumPy because SciPy. This is a very bad TeX design and to fix it we have to get away from auto-doc generation which I am sure none of us want for now. That is unfortunately how good docs are now today, mathworks constantly being praised about it despite its notoriety. Hence I don't see any case for keeping generating this PDF. If you want to have a proper doc effort, that is a whole another story and I would love to have that but this doc generated PDF is not worth of any nonnegligable value. And if you think it is then you can generate it yourself. The tools are not going to be removed anyways.
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: rgoswami@quansight.com

On Mon, May 23, 2022 at 2:00 PM Rohit Goswami <rgoswami@quansight.com> wrote:
The contents are nonresponsive. No tool can fix a native responsiveness issues. I am familiar with those tools. The questions is why work so hard when you have the HTML already?
I'm afraid I don't understand this argument. It is true that PDFs are not responsive without software assistance, but HTML documents when printed (e.g. CTRL+P) do not have any way of generating the Appendix / outlinks to related sections etc. Yet we still have HTML documentation. IMO this simply means they are not mutually exclusive.
The argument is about why one should use PDF on a mobile device. I am not even going to bother with the argument. The world moved on. See any app on your device. No one renders PDF. Because this is not what it is designed for. But everybody sends you a custom PDF for archival purposes. You might think they are all wrong but that's your opinion.
About this, the full page layout on the HTML pages has exactly the same amount of whitespace. It can be argued that for a full width layout there is exactly the same whitespace and indentation.
Additionally, even trying to print out say, np.einsum will first have 3 pages of the sidebar when using a naive CTRL+P approach.
The argument that the typography is poor goes beyond the documentation format.
In fact, even the "responsiveness" is rather overrated at the moment. With a mobile device again the first few screenfulls are simply the sidebar with routines and other things. After which there's still whitespace, and things are still just as indented as in the PDF. Only now I also don't have a global TOC which is easy to see.
It is not overrated. You are basically saying UX people are just doing bloated fancy work. This is not how things are designed. I am not recommending Ctrl+P on the HTML document. We are saying use the HTML files offline. This is again not an issue to argue about. The pages are following a Markdown format. If you want to have this subpar document you can generate it yourself. Having a burden on the CI system because maybe someone uses it is not sufficient reason to keep it. Or at least that's my motivation.
The assertion that there is parity between serving ~2000 pages of documentation as HTML and ZIP files as opposed to a PDF seems to be flawed from the get go.
Again, you are comparing doc formats. The argument is to not to distribute PDFs. If you like that document regardless of its current state then fine do it. But as I mentioned, current state of affairs in the documentation world is not even bothering with PDF. When do you get PDFs? Exactly in technical manuals which are custom designed and provided with the products for archiving specific to that product. A documentation that invalidates its version every 6 months is again not a valid argument. PDF is a document format. And you have to generate it properly. Currently it is a very bad copy of the HTML version with no attention to the medium with which the information is presented. And the burden is on you to provide significant demand for it, the traffic to the site shows how much HTML is used.
I should add that NumPy does indeed have a dedicated docs team and consolidated effort. As mentioned earlier we meet regularly about these issues and it would be nice if the meetings are not unequivocally sidestepped by the mailing list. We also apply for funding (GSoD / NumFocus SDG) for our docs.
I understand there are frustrations with the PDF, but I am still not convinced at this point that the HTML versions are even at par with the PDF experience.
You are assuming that everyone is sharing your experience of PDF. I am also not convinced that abusing PDF format warrants its use in documentation. As I mentioned, the burden is you to prove its worth. That's why we are proposing to remove it. Otherwise it wouldn't be discussed here and removed in SciPy.
It is nice that I have the time and ability to generate my documentation locally for my niche needs should I so wish it. It is less nice that we assume that it must be niche and everyone would have the same energy because HTML is theoretically more responsive, even though our docs are not.
That's what we started with, it is a long and annoying nontrivial process with diminishing outcome. We shouldn't spend this energy on a document that is not requested in general. If there is a demand for it we don't see it anywhere. Glad that we are aggressively agreeing. Also HTML responsiveness need no proof. Just look at the page source on your browser and change the size of your window. The docs are responsive though not perfect (it collapses to the wrong frame in the smallest size for example but fixable) but definitely much more readable. PDFs are substantially more powerful than HTML but you need to exploit that by custom documentation. Not with auto-generated signatures. Technical writers, UX and documentation teams are doing extremely important job and making a static screenshot on a PDF is definitely not a viable replacement for that work. Or you might be missing out what the contemporary options are if you think it is.

The argument is about why one should use PDF on a mobile device. I am not even going to bother with the argument. The world moved on. See any app on your device.
Lets agree to not talk about the world here a bit, user profiles vary. I have three browser apps true, but also a bunch of PDF readers. But in any case, lets talk about the UX a bit. We are assuming that instead of downloading one document and using that when there is no internet: - Go to pg. 50 on numpy-ref.pdf We would instead be: - Explaining how to unzip things on a mobile (default file managers are not guaranteed to come with zip support) - Explaining how to use offline HTML (or even better, have them find the file of interest and open that with the local path) - Tell someone to either use the search or navigate the rather baroque mobile interface to find something Because the average user in a low network area is clearly going to be aware of all this. Mostly because we don't believe PDFs are viable. Also, the Reference Manual is exactly what I'd expect from a technical manual. The argument is on the other side, that trying to actually read or use the HTML document as a replacement for the Reference Manual is rather unlikely. Again, the intents differ, perhaps for something like SciPy, a user-focused library, a reference manual enumerating API design and decisions doesn't make much sense. NumPy has a lot of design documentation which is kanged from an actual book...
Again, you are comparing doc formats.
Nope, it is the information content and ease of access, see usage pattern above for an offline developer.
It is not overrated. You are basically saying UX people are just doing bloated fancy work. This is not how things are designed.
This is not what I meant at all, I have a deep love for frontend work, I was merely pointing out that at this point in time I don't think our "responsive" HTML is any better than our "broken" PDFs. I think we'd be hard pressed to find anyone saying the current HTML documentation follows best responsive practices (the sidebar should be minimized, I shouldn't have to scroll for tens of seconds to get to text I was looking for, the list goes on)
That's why we are proposing to remove it. Otherwise it wouldn't be discussed here and removed in SciPy.
Removed in SciPy really needs to stop being part of this discussion. Unless we want to merge the projects back together; we don't *have* to evolve in lock-step. Hence the discussion.
If there is a demand for it we don't see it anywhere. Glad that we are aggressively agreeing.
Why do we expect people to request a document we already provide? This is the first time we have this on the mailing list, so we should simply defer the discussion. We can add analytics to the `/doc` page and check back in a few releases. IMO the argument about "load" is a bit fallacious, we don't actually seem to be generating or serving PDFs per commit (but I might be wrong). Also, definitely not agreeing at all yet :)
Also HTML responsiveness need no proof. Just look at the page source on your browser and change the size of your window. The docs are responsive though not perfect (it collapses to the wrong frame in the smallest size for example but fixable) but definitely much more readable.
Here are some steps to reproduce the mobile experience. They rely on either using an actual mobile device or "web inspector" or "developer tools": - Switch to a standard format, and watch the sidebar take up most of the real-estate - Try using the zip as discussed above Additionally, I really don't intend to bash on the HTML, of course we could add more breakpoints, and special casing until it looks / behaves better. Until we do so, the removal of the PDFs seem awkward. As for CI load and metrics, we don't have hard numbers for a lot of these things and it feels strange to discuss what we feel is "responsible use". --- Rohit P.S. By happy coincidence, I see we have an upcoming documentation meeting in around 3 hours. As always, everyone is welcome to come discuss here: https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg On 23 May 2022, at 12:42, Ilhan Polat wrote:
On Mon, May 23, 2022 at 2:00 PM Rohit Goswami <rgoswami@quansight.com> wrote:
The contents are nonresponsive. No tool can fix a native responsiveness issues. I am familiar with those tools. The questions is why work so hard when you have the HTML already?
I'm afraid I don't understand this argument. It is true that PDFs are not responsive without software assistance, but HTML documents when printed (e.g. CTRL+P) do not have any way of generating the Appendix / outlinks to related sections etc. Yet we still have HTML documentation. IMO this simply means they are not mutually exclusive.
The argument is about why one should use PDF on a mobile device. I am not even going to bother with the argument. The world moved on. See any app on your device. No one renders PDF. Because this is not what it is designed for. But everybody sends you a custom PDF for archival purposes. You might think they are all wrong but that's your opinion.
About this, the full page layout on the HTML pages has exactly the same amount of whitespace. It can be argued that for a full width layout there is exactly the same whitespace and indentation.
Additionally, even trying to print out say, np.einsum will first have 3 pages of the sidebar when using a naive CTRL+P approach.
The argument that the typography is poor goes beyond the documentation format.
In fact, even the "responsiveness" is rather overrated at the moment. With a mobile device again the first few screenfulls are simply the sidebar with routines and other things. After which there's still whitespace, and things are still just as indented as in the PDF. Only now I also don't have a global TOC which is easy to see.
It is not overrated. You are basically saying UX people are just doing bloated fancy work. This is not how things are designed. I am not recommending Ctrl+P on the HTML document. We are saying use the HTML files offline. This is again not an issue to argue about. The pages are following a Markdown format. If you want to have this subpar document you can generate it yourself. Having a burden on the CI system because maybe someone uses it is not sufficient reason to keep it. Or at least that's my motivation.
The assertion that there is parity between serving ~2000 pages of documentation as HTML and ZIP files as opposed to a PDF seems to be flawed from the get go.
Again, you are comparing doc formats. The argument is to not to distribute PDFs. If you like that document regardless of its current state then fine do it. But as I mentioned, current state of affairs in the documentation world is not even bothering with PDF. When do you get PDFs? Exactly in technical manuals which are custom designed and provided with the products for archiving specific to that product. A documentation that invalidates its version every 6 months is again not a valid argument. PDF is a document format. And you have to generate it properly. Currently it is a very bad copy of the HTML version with no attention to the medium with which the information is presented. And the burden is on you to provide significant demand for it, the traffic to the site shows how much HTML is used.
I should add that NumPy does indeed have a dedicated docs team and consolidated effort. As mentioned earlier we meet regularly about these issues and it would be nice if the meetings are not unequivocally sidestepped by the mailing list. We also apply for funding (GSoD / NumFocus SDG) for our docs.
I understand there are frustrations with the PDF, but I am still not convinced at this point that the HTML versions are even at par with the PDF experience.
You are assuming that everyone is sharing your experience of PDF. I am also not convinced that abusing PDF format warrants its use in documentation. As I mentioned, the burden is you to prove its worth. That's why we are proposing to remove it. Otherwise it wouldn't be discussed here and removed in SciPy.
It is nice that I have the time and ability to generate my documentation locally for my niche needs should I so wish it. It is less nice that we assume that it must be niche and everyone would have the same energy because HTML is theoretically more responsive, even though our docs are not.
That's what we started with, it is a long and annoying nontrivial process with diminishing outcome. We shouldn't spend this energy on a document that is not requested in general. If there is a demand for it we don't see it anywhere. Glad that we are aggressively agreeing.
Also HTML responsiveness need no proof. Just look at the page source on your browser and change the size of your window. The docs are responsive though not perfect (it collapses to the wrong frame in the smallest size for example but fixable) but definitely much more readable. PDFs are substantially more powerful than HTML but you need to exploit that by custom documentation. Not with auto-generated signatures. Technical writers, UX and documentation teams are doing extremely important job and making a static screenshot on a PDF is definitely not a viable replacement for that work. Or you might be missing out what the contemporary options are if you think it is.
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: rgoswami@quansight.com

This goalpost is moving too much. I am not drawing parallels with SciPy to make a point on NumPy. I am using it to give another data point where we did exactly the same thing with zero backlash providing its usage frequency. You might think they are separate but the user base is surprisingly similar. I think you are underestimating SciPy about its complexity or affinity to NumPy. I also made real life cases where keeping a running infra about TeX is a mess. And it's not a hypothetical, these cases are commercial cases where results are way more important than NumPy. Nobody except a measure of zero is using it. So again, another data point. Not doing any effect. Mobile experience is again a weak argument. If you have your mobile device use the online docs. I think I can see the number of people who would read NumPy documentation with offline mobile devices so that would not be a valid argument. I'd be happy to see the demand though. I cannot take it seriously that PDF reading on a mobile device would surpass HTML, even desktop apps are electron based HTML. You can argue our HTML experience would need work and that's a valid argument but worse than PDF is not going to fly. If you want to keep your stance you are welcome to do it. But run it with other people who do this for a living. I think I have made my case sufficiently times by now. You are arguing your point from your personal perspective. So I don't have to entertain it. If you think this technical manual is good then we have quite different opinions about quality of a technical document. But you still have the burden of proof to demonstrate the demand for it on the CI system which you have not provided yet. And that is the point here without putting any obstacle on users who prefer PDF manuals can still generate it. I'll leave the decision to others to not to generate more noise. On Mon, May 23, 2022 at 3:06 PM Rohit Goswami <rgoswami@quansight.com> wrote:
The argument is about why one should use PDF on a mobile device. I am not even going to bother with the argument. The world moved on. See any app on your device.
Lets agree to not talk about the world here a bit, user profiles vary. I have three browser apps true, but also a bunch of PDF readers.
But in any case, lets talk about the UX a bit.
We are assuming that instead of downloading one document and using that when there is no internet:
- Go to pg. 50 on numpy-ref.pdf
We would instead be:
- Explaining how to unzip things on a mobile (default file managers are not guaranteed to come with zip support) - Explaining how to use offline HTML (or even better, have them find the file of interest and open that with the local path) - Tell someone to either use the search or navigate the rather baroque mobile interface to find something
Because the average user in a low network area is clearly going to be aware of all this.
Mostly because we don't believe PDFs are viable. Also, the Reference Manual is exactly what I'd expect from a technical manual. The argument is on the other side, that trying to actually read or use the HTML document as a replacement for the Reference Manual is rather unlikely.
Again, the intents differ, perhaps for something like SciPy, a user-focused library, a reference manual enumerating API design and decisions doesn't make much sense. NumPy has a lot of design documentation which is kanged from an actual book...
Again, you are comparing doc formats.
Nope, it is the information content and ease of access, see usage pattern above for an offline developer.
It is not overrated. You are basically saying UX people are just doing bloated fancy work. This is not how things are designed.
This is not what I meant at all, I have a deep love for frontend work, I was merely pointing out that at this point in time I don't think our "responsive" HTML is any better than our "broken" PDFs. I think we'd be hard pressed to find anyone saying the current HTML documentation follows best responsive practices (the sidebar should be minimized, I shouldn't have to scroll for tens of seconds to get to text I was looking for, the list goes on)
That's why we are proposing to remove it. Otherwise it wouldn't be discussed here and removed in SciPy.
Removed in SciPy really needs to stop being part of this discussion. Unless we want to merge the projects back together; we don't *have* to evolve in lock-step. Hence the discussion.
If there is a demand for it we don't see it anywhere. Glad that we are aggressively agreeing.
Why do we expect people to request a document we already provide? This is the first time we have this on the mailing list, so we should simply defer the discussion. We can add analytics to the /doc page and check back in a few releases. IMO the argument about "load" is a bit fallacious, we don't actually seem to be generating or serving PDFs per commit (but I might be wrong).
Also, definitely not agreeing at all yet :)
Also HTML responsiveness need no proof. Just look at the page source on your browser and change the size of your window. The docs are responsive though not perfect (it collapses to the wrong frame in the smallest size for example but fixable) but definitely much more readable.
Here are some steps to reproduce the mobile experience. They rely on either using an actual mobile device or "web inspector" or "developer tools":
- Switch to a standard format, and watch the sidebar take up most of the real-estate - Try using the zip as discussed above
Additionally, I really don't intend to bash on the HTML, of course we could add more breakpoints, and special casing until it looks / behaves better.
Until we do so, the removal of the PDFs seem awkward.
As for CI load and metrics, we don't have hard numbers for a lot of these things and it feels strange to discuss what we feel is "responsible use".
--- Rohit
P.S. By happy coincidence, I see we have an upcoming documentation meeting in around 3 hours. As always, everyone is welcome to come discuss here: https://hackmd.io/oB_boakvRqKR-_2jRV-Qjg
On 23 May 2022, at 12:42, Ilhan Polat wrote:
On Mon, May 23, 2022 at 2:00 PM Rohit Goswami <rgoswami@quansight.com> wrote:
The contents are nonresponsive. No tool can fix a native responsiveness issues. I am familiar with those tools. The questions is why work so hard when you have the HTML already?
I'm afraid I don't understand this argument. It is true that PDFs are not responsive without software assistance, but HTML documents when printed (e.g. CTRL+P) do not have any way of generating the Appendix / outlinks to related sections etc. Yet we still have HTML documentation. IMO this simply means they are not mutually exclusive.
The argument is about why one should use PDF on a mobile device. I am not even going to bother with the argument. The world moved on. See any app on your device. No one renders PDF. Because this is not what it is designed for. But everybody sends you a custom PDF for archival purposes. You might think they are all wrong but that's your opinion.
About this, the full page layout on the HTML pages has exactly the same amount of whitespace. It can be argued that for a full width layout there is exactly the same whitespace and indentation.
Additionally, even trying to print out say, np.einsum will first have 3 pages of the sidebar when using a naive CTRL+P approach.
The argument that the typography is poor goes beyond the documentation format.
In fact, even the "responsiveness" is rather overrated at the moment. With a mobile device again the first few screenfulls are simply the sidebar with routines and other things. After which there's still whitespace, and things are still just as indented as in the PDF. Only now I also don't have a global TOC which is easy to see.
It is not overrated. You are basically saying UX people are just doing bloated fancy work. This is not how things are designed. I am not recommending Ctrl+P on the HTML document. We are saying use the HTML files offline. This is again not an issue to argue about. The pages are following a Markdown format. If you want to have this subpar document you can generate it yourself. Having a burden on the CI system because maybe someone uses it is not sufficient reason to keep it. Or at least that's my motivation.
The assertion that there is parity between serving ~2000 pages of documentation as HTML and ZIP files as opposed to a PDF seems to be flawed from the get go.
Again, you are comparing doc formats. The argument is to not to distribute PDFs. If you like that document regardless of its current state then fine do it. But as I mentioned, current state of affairs in the documentation world is not even bothering with PDF. When do you get PDFs? Exactly in technical manuals which are custom designed and provided with the products for archiving specific to that product. A documentation that invalidates its version every 6 months is again not a valid argument. PDF is a document format. And you have to generate it properly. Currently it is a very bad copy of the HTML version with no attention to the medium with which the information is presented. And the burden is on you to provide significant demand for it, the traffic to the site shows how much HTML is used.
I should add that NumPy does indeed have a dedicated docs team and consolidated effort. As mentioned earlier we meet regularly about these issues and it would be nice if the meetings are not unequivocally sidestepped by the mailing list. We also apply for funding (GSoD / NumFocus SDG) for our docs.
I understand there are frustrations with the PDF, but I am still not convinced at this point that the HTML versions are even at par with the PDF experience.
You are assuming that everyone is sharing your experience of PDF. I am also not convinced that abusing PDF format warrants its use in documentation. As I mentioned, the burden is you to prove its worth. That's why we are proposing to remove it. Otherwise it wouldn't be discussed here and removed in SciPy.
It is nice that I have the time and ability to generate my documentation locally for my niche needs should I so wish it. It is less nice that we assume that it must be niche and everyone would have the same energy because HTML is theoretically more responsive, even though our docs are not.
That's what we started with, it is a long and annoying nontrivial process with diminishing outcome. We shouldn't spend this energy on a document that is not requested in general. If there is a demand for it we don't see it anywhere. Glad that we are aggressively agreeing.
Also HTML responsiveness need no proof. Just look at the page source on your browser and change the size of your window. The docs are responsive though not perfect (it collapses to the wrong frame in the smallest size for example but fixable) but definitely much more readable. PDFs are substantially more powerful than HTML but you need to exploit that by custom documentation. Not with auto-generated signatures. Technical writers, UX and documentation teams are doing extremely important job and making a static screenshot on a PDF is definitely not a viable replacement for that work. Or you might be missing out what the contemporary options are if you think it is.
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: rgoswami@quansight.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ilhanpolat@gmail.com

If it's any help, I would suggest looking at how SymPy does its PDF documentation. We have a few adjustments to the Sphinx defaults to make things work (although it's honestly not that much, mainly just using XeTeX for Unicode support). https://github.com/sympy/sympy/blob/master/doc/Makefile One of the things that was mentioned in the issue is SVG images. We found in SymPy that the best way to convert SVG images to a format that can be used for PDFs is to use Google Chrome (all the other SVG to PDF converters are flawed in some way, see https://github.com/sympy/sympy/pull/22468 for details). See this script https://github.com/sympy/sympy/blob/master/doc/convert-svg-to-pdf.sh and also this PR https://github.com/sympy/sympy/pull/23035. Perhaps the code in that PR should be factored out into a separate Sphinx extension that can be reused by other projects. I personally don't use PDF documentation but I know for SymPy, many people do, which is why we have continued to support it. Also, one final thing I will note is that if you have any LaTeX math in your documentation, the PDF build is the only way you can be sure that that LaTeX is valid. MathJax only renders the LaTeX in the browser when the page is loaded so you will only notice invalid math once you look at it and see it gives an error. To contrast, the PDF docs fail at the build stage if the LaTeX is invalid. Obviously we have a lot of math in the SymPy docs but I don't know if there is any in the NumPy or SciPy docs. This does also mean that to be useful you actually need to build it on CI. If you don't, you will end up with docs that don't actually build any more by the release because of unseen errors (even without LaTeX math this will likely end up happening, so I would recommend against this). Aaron Meurer On Sun, May 22, 2022 at 2:08 PM Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
This is a proposal to drop the generation of pdf documentation and only generate the html version. This is a one way change due to the difficulty maintaining/fixing the pdf versions. See minimal discussion here <https://github.com/numpy/numpy/issues/21557#issuecomment-1133920412>.
Chuck _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: asmeurer@gmail.com
participants (11)
-
Aaron Meurer
-
Charles R Harris
-
Feng Yu
-
Ilhan Polat
-
Lev Maximov
-
Matti Picus
-
Melissa Mendonça
-
Ralf Gommers
-
Rohit Goswami
-
Stefan van der Walt
-
Stephan Hoyer