Internationalization of numpy/scipy docstrings...
I have thought for a long time that it would be nice to have numpy/scipy docs in multiple languages. I didn't have any idea how to do it until I saw http://sphinx.pocoo.org/intl.html. The gettext builder which is a requirement to make this happen is relatively new to sphinx. Outline of above applied to numpy/scipy... 1. pydocweb would use the new gettext builder to convert *.rst to *.pot files. 2. Translators would use pootle to edit the *.pot files to *.po files pydocweb or pootle would use mgsfmt to create *.mo files 3. From here can choose either: a. Have pydocweb use sphinx-build to create new, translated *.rst files from the *.mo files. (my favorite since we would have *.rst files) b. OR use gettext in Python to translate docstring on-the-fly from the *.mo files. A user would then install a language kit, maybe something like scikits and access the translated docstring with a new 'np.info'. As near as I can figure, Python 'help' command can't be replaced by something else, so 'help' would always display the English docstring. I have pydocweb and pootle setup locally and working. Ran into a problem though with sphinx-build creating the initial *.pot files. It seems to be a problem with numpydoc. It fails on 'function' and 'auto*' directives. I tried to look at numpydoc and it is a bit of very intense coding and I frankly have not been able to find my way around. I am willing to put in some work for this to happen. My block right now is getting the initial *.pot files. Any interest? You can see the problem directly by changing into the numpy/doc directory and use the following command: sphinx-build -b gettext -P source/ gettext/ Once sphinx-build is working, then the target build directory (which I called 'gettext' above) would be in a location accessible to pootle. Kindest regards, Tim
A user would then install a language kit, maybe something like scikits and access the translated docstring with a new 'np.info'. As near as I can
On May 19, 2012 11:04 PM, "Tim Cera"
On Sat, May 19, 2012 at 8:16 PM, Nathaniel Smith
help() just returns the __doc__ attribute, but a large number of numpy's __doc__ attributes are set up by code at import time, so in principle even these could be run through gettext pretty easily.
I didn't know that. I suggested modifying np.info since I suspect that a new np.info would be easier since changes to support i18n would be contained to one command. Of course if there is something easier/better, let's go with that. Kindest regards, Tim
On Sun, May 20, 2012 at 12:04 AM, Tim Cera
I have thought for a long time that it would be nice to have numpy/scipy docs in multiple languages. I didn't have any idea how to do it until I saw http://sphinx.pocoo.org/intl.html. The gettext builder which is a requirement to make this happen is relatively new to sphinx.
Outline of above applied to numpy/scipy...
1. pydocweb would use the new gettext builder to convert *.rst to *.pot files.
2. Translators would use pootle to edit the *.pot files to *.po files
pydocweb or pootle would use mgsfmt to create *.mo files
3. From here can choose either:
a. Have pydocweb use sphinx-build to create new,
translated *.rst files from the *.mo files.
(my favorite since we would have *.rst files)
b. OR use gettext in Python to translate docstring
on-the-fly from the *.mo files.
A user would then install a language kit, maybe something like scikits and access the translated docstring with a new 'np.info'. As near as I can figure, Python 'help' command can't be replaced by something else, so 'help' would always display the English docstring.
I have pydocweb and pootle setup locally and working. Ran into a problem though with sphinx-build creating the initial *.pot files. It seems to be a problem with numpydoc. It fails on 'function' and 'auto*' directives. I tried to look at numpydoc and it is a bit of very intense coding and I frankly have not been able to find my way around.
I am willing to put in some work for this to happen. My block right now is getting the initial *.pot files.
Any interest?
Are you thinking only about documentation in .rst files (like the tutorials), or also the docstrings themselves? The former may be feasible, the latter I think will be difficult. Ralf
Are you thinking only about documentation in .rst files (like the tutorials), or also the docstrings themselves? The former may be feasible, the latter I think will be difficult.
Everything. Within the documentation editor the RST docstrings are parsed from the functions, so instead of only storing them in the database for Django/doceditor to work with, can save them to *.rst files. I don't know how integrated we could/would make the documentation editor/sphinx/pootle combination, so I think the easiest would be integration through files. Your question points out a detail (and some small refinements) that I should have put in the outline from my first message: 0.5. As the pydocweb editor works on docstrings, up-to-date RST files are also saved to the file system, and triggers... 1. The new gettext builder to convert *.rst to *.pot files. 1.5. (OPTIONAL) Can make a preliminary, automatic translation. Pootle currently supports Google Translate (now costs $) or Apertium. 2. Translators would use pootle to edit the *.pot files to *.po files 2.5. Use mgsfmt to create *.mo files 3. From here can choose either: a. Use sphinx-build to create new, translated *.rst files from the *.mo files. (my favorite since we would have *.rst files) b. OR use gettext in Python to translate docstring on-the-fly from the *.mo files. At this point we would need to have an environment variable or other configuration mechanism to set the desired locale, which np.info would use to find the correct directory/rst file. Lets just say for sake of my example that the configuration is handled by a np.locale function. np.info(np.array) # display English docstring as it currently does np.locale('fr') np.info(np.array) # display the French docstring Reference links: sphinx based translation http://sphinx.pocoo.org/latest/intl.html http://www.slideshare.net/lehmannro/sphinxi18n-the-true-story Pootle: http://translate.sourceforge.net/wiki/pootle/index (You have to get the development versions of translate and pootle to work with Django 1.4.) Kindest regards, Tim
On Sun, May 20, 2012 at 11:59 PM, Tim Cera
Are you thinking only about documentation in .rst files (like the
tutorials), or also the docstrings themselves? The former may be feasible, the latter I think will be difficult.
Everything. Within the documentation editor the RST docstrings are parsed from the functions, so instead of only storing them in the database for Django/doceditor to work with, can save them to *.rst files.
I don't know how integrated we could/would make the documentation editor/sphinx/pootle combination, so I think the easiest would be integration through files. Your question points out a detail (and some small refinements) that I should have put in the outline from my first message:
0.5. As the pydocweb editor works on docstrings, up-to-date RST files are also saved to the file system, and triggers...
1. The new gettext builder to convert *.rst to *.pot files.
1.5. (OPTIONAL) Can make a preliminary, automatic translation. Pootle
currently supports Google Translate (now costs $) or Apertium.
2. Translators would use pootle to edit the *.pot files to *.po files
2.5. Use mgsfmt to create *.mo files
3. From here can choose either:
a. Use sphinx-build to create new,
translated *.rst files from the *.mo files.
(my favorite since we would have *.rst files)
b. OR use gettext in Python to translate docstring
on-the-fly from the *.mo files.
Docstrings are not stored in .rst files but in the numpy sources, so there are some non-trivial technical and workflow details missing here. But besides that, I think translating everything (even into a single language) is a massive amount of work, and it's not at all clear if there's enough people willing to help out with this. So I'd think it would be better to start with just the high-level docs (numpy user guide, scipy tutorial) to see how it goes. Thinking about what languages to translate into would also make sense, since having a bunch of partial translations lying around doesn't help anyone. First thought: Spanish, Chinese. Ralf
At this point we would need to have an environment variable or other configuration mechanism to set the desired locale, which np.info would use to find the correct directory/rst file. Lets just say for sake of my example that the configuration is handled by a np.locale function.
np.info(np.array)
# display English docstring as it currently does
np.locale('fr')
np.info(np.array)
# display the French docstring
Reference links:
sphinx based translation
http://sphinx.pocoo.org/latest/intl.html
http://www.slideshare.net/lehmannro/sphinxi18n-the-true-story
Pootle:
http://translate.sourceforge.net/wiki/pootle/index
(You have to get the development versions of translate and pootle to work with Django 1.4.)
Kindest regards,
Tim
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, May 21, 2012 at 10:44 PM, Ralf Gommers
Thinking about what languages to translate into would also make sense, since having a bunch of partial translations lying around doesn't help anyone. First thought: Spanish, Chinese.
It's not like one can tell two translator volunteers to start speaking the same language so as to better pool their efforts... they kind of speak whatever they speak. But there is quite a bit of translator volunteer person-power available out there across many languages. If Tim gets the infrastructure worked out then advertising on some of the big translation project mailing lists will probably get a lot of eyeballs. - N
Docstrings are not stored in .rst files but in the numpy sources, so there are some non-trivial technical and workflow details missing here. But besides that, I think translating everything (even into a single language) is a massive amount of work, and it's not at all clear if there's enough people willing to help out with this. So I'd think it would be better to start with just the high-level docs (numpy user guide, scipy tutorial) to see how it goes.
I understand that this is non-trivial, for me anyway, because I can't figure out how to make my way around numpydoc, and documentation editor code (not quite true, as Pauli accepted a couple of my pull requests, but I definitely can't make it dance). This is why I asked for interest and help on the mailing list. I think for the people that worked on the documentation editor, or know Django, or are cleverer than I, the required changes to the documentation editor might by mid-trivial. That is my hope anyway. Would probably have the high-level docs separate from the docstring processing anyway since the high-level docs are already in a sphinx source directory. So I agree that the high-level docs would be the best place to start and in-fact that is what I was working with and found the problem with the sphinx gettext builder mentioned in the original post. I do want to defend and clarify the docstring processing though. Docstrings, in the code, will always be English. The documentation editor is the fulcrum. The documentation editor will work with the in the code docstrings *exactly *as it does now. The documentation editor would be changed so that when it writes the ReST formatted docstring back into the code, it *also *writes a *.rst file to a separate sphinx source directory. These *.rst files would not be part of the numpy source code directory, but an interim file for the documentation editor and sphinx to extract strings to make *.po files, pootle + hordes of translators :-) gives *.pot files, *.pot -> *.mo -> *.rst (translated). The English *.rst, *.po, *.pot, *.mo files are all interim products behind the scenes. The translated *.rst files would NOT be part of the numpy source code, but packaged separately. I must admit that I did hope that there would be more interest. Maybe I should have figured out how to put 'maskna' or '1.7' in the subject? In defense of there not be much interest is that the people who would possibly benefit, aren't reading English mailing lists. Kindest regards, Tim
On Tue, May 22, 2012 at 10:51 AM, Tim Cera
Docstrings are not stored in .rst files but in the numpy sources, so there are some non-trivial technical and workflow details missing here. But besides that, I think translating everything (even into a single language) is a massive amount of work, and it's not at all clear if there's enough people willing to help out with this. So I'd think it would be better to start with just the high-level docs (numpy user guide, scipy tutorial) to see how it goes.
I understand that this is non-trivial, for me anyway, because I can't figure out how to make my way around numpydoc, and documentation editor code (not quite true, as Pauli accepted a couple of my pull requests, but I definitely can't make it dance). This is why I asked for interest and help on the mailing list. I think for the people that worked on the documentation editor, or know Django, or are cleverer than I, the required changes to the documentation editor might by mid-trivial. That is my hope anyway.
Would probably have the high-level docs separate from the docstring processing anyway since the high-level docs are already in a sphinx source directory. So I agree that the high-level docs would be the best place to start and in-fact that is what I was working with and found the problem with the sphinx gettext builder mentioned in the original post.
I do want to defend and clarify the docstring processing though. Docstrings, in the code, will always be English. The documentation editor is the fulcrum. The documentation editor will work with the in the code docstrings exactly as it does now. The documentation editor would be changed so that when it writes the ReST formatted docstring back into the code, it also writes a *.rst file to a separate sphinx source directory. These *.rst files would not be part of the numpy source code directory, but an interim file for the documentation editor and sphinx to extract strings to make *.po files, pootle + hordes of translators :-) gives *.pot files, *.pot -> *.mo -> *.rst (translated). The English *.rst, *.po, *.pot, *.mo files are all interim products behind the scenes. The translated *.rst files would NOT be part of the numpy source code, but packaged separately.
I must admit that I did hope that there would be more interest. Maybe I should have figured out how to put 'maskna' or '1.7' in the subject?
In defense of there not be much interest is that the people who would possibly benefit, aren't reading English mailing lists.
One advantage of getting this done would be that other packages could follow the same approach. Just as numpy.testing and numpy's doc standard has spread to related packages, being able to generate translations might be even more interesting to downstream packages. There the fraction of end users, that are not used to working in english anyway, might be larger than for numpy itself. The numpy mailing list may be to narrow to catch the attention of developers with enough interest and expertise in the area. Josef
Kindest regards, Tim
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (4)
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
Ralf Gommers
-
Tim Cera