implementing IDL, Matlab, etc. functionality
![](https://secure.gravatar.com/avatar/1982f0b30a415992c034dd77eb7224ab.jpg?s=120&d=mm&r=g)
Two of the big barriers to people converting from commercial environments to SciPy are 1) their dependence on specific packages in those languages (e.g., the fuzzy logic package in Matlab that was recently mentioned, or the IDL Wavelet Toolkit), and 2) their dependence on their own code, developed over the course of their career. There are legal issues we need to be aware of in coming up with solutions. There are two problems: first, to use IDL (and, I assume, Matlab), you have to agree to a license that says you will not reverse engineer the product. The packages include a lot of routines in source form. Most users who are likely to write compatible packages in SciPy have looked at these routines. They thus have "tainted brains": the commercial entity can claim that they have seen proprietary information, and that they had to use this information to produce the Python package. Second, Cornell's legal staff has informed me that there are two court precedents that state that if you have *ever* used a product, and you try to make something that works the same way, it's reverse engineering. I know that sounds ludicrous to everyone, it does to me, but you may have to fight at the Federal Appeals level or above to be sure. Again, since users of IDL and Matlab have agreed to licenses that prohibit reverse engineering, this would mean that any code produced by such people would be tainted and we could be sued. "We" means us as individuals, our institutions (who employ us, pay us, and might be construed as the licencees of IDL and Matlab), Enthought for distributing it, etc. Regardless of how crazy this interpretation might sound, I doubt that we as a community have the financial resources to mount an appeal. There is also no shortage of crazy legal interpretations that stick in current computer law, so there is no guarrantee that we'd win. One of the original plaintiffs was Sony, which would surely weigh in with amicus briefs and perhaps financial help to the plaintiff, since their IP would be at stake if their precedent were turned over on our appeal. I think there is a technical way out, however. I have only agreed to an IDL license. Nothing says I can't implement something that works like Matlab. Reverse engineering is legal, at least in the great state of New York, unless you sign that right away. So, it makes some sense to set up some trades: I can write a Matlab -> Python converter if you'll do it for IDL. All that's needed is for each side to provide examples of code in the proprietary language that they have written themselves, along with descriptions of what it does and sample inputs and outputs. That plus commercial books on the languages should provide all the information we need. For specific packages, it makes more sense for people to do them from scratch or by wrapping existing open-source code. For one thing, the efforts of the commercial packages are usually old and almost always procedural. Much is to be gained implementing as a good OO design based on the method descriptions in the literature and the particular strengths of scipy.base. Then, someone else who has not signed a license can do workalike interfaces for the commercial packages. The benefit here is that IDL and Matlab are pretty backward languages, whereas Python is quite modern. Matlab users may benefit from encountering a Matlab-compatible fuzzy logic interface, but IDL users certainly don't. It would be much better for them to learn the OO interface. The reverse is of course true for, say, IDL wavelets. So, before anyone (else) goes diving into implementing something from IDL, Matlab, etc., please give consideration to these issues. --jh--
![](https://secure.gravatar.com/avatar/ea5ada23a8a7bbfd407cf47eae34ce76.jpg?s=120&d=mm&r=g)
whereas Python is quite modern. Matlab users may benefit from encountering a Matlab-compatible fuzzy logic interface, but IDL users certainly don't. It would be much better for them to learn the OO
Point well taken. Over the last week or so, I've found deficencies in the MatLab Fuzzy logic toolkit. Let's say I'll start off with building a FuzzyLogic API that provides the necessary functionality - and then worry about creating an interface that MatLab users would find intuitive (and yet different enough that we don't run into any legal risks). --Prasan On 5/2/05, Joe Harrington <jh@oobleck.astro.cornell.edu> wrote:
Two of the big barriers to people converting from commercial environments to SciPy are 1) their dependence on specific packages in those languages (e.g., the fuzzy logic package in Matlab that was recently mentioned, or the IDL Wavelet Toolkit), and 2) their dependence on their own code, developed over the course of their career. There are legal issues we need to be aware of in coming up with solutions.
There are two problems: first, to use IDL (and, I assume, Matlab), you have to agree to a license that says you will not reverse engineer the product. The packages include a lot of routines in source form. Most users who are likely to write compatible packages in SciPy have looked at these routines. They thus have "tainted brains": the commercial entity can claim that they have seen proprietary information, and that they had to use this information to produce the Python package. Second, Cornell's legal staff has informed me that there are two court precedents that state that if you have *ever* used a product, and you try to make something that works the same way, it's reverse engineering. I know that sounds ludicrous to everyone, it does to me, but you may have to fight at the Federal Appeals level or above to be sure. Again, since users of IDL and Matlab have agreed to licenses that prohibit reverse engineering, this would mean that any code produced by such people would be tainted and we could be sued. "We" means us as individuals, our institutions (who employ us, pay us, and might be construed as the licencees of IDL and Matlab), Enthought for distributing it, etc. Regardless of how crazy this interpretation might sound, I doubt that we as a community have the financial resources to mount an appeal. There is also no shortage of crazy legal interpretations that stick in current computer law, so there is no guarrantee that we'd win. One of the original plaintiffs was Sony, which would surely weigh in with amicus briefs and perhaps financial help to the plaintiff, since their IP would be at stake if their precedent were turned over on our appeal.
I think there is a technical way out, however. I have only agreed to an IDL license. Nothing says I can't implement something that works like Matlab. Reverse engineering is legal, at least in the great state of New York, unless you sign that right away. So, it makes some sense to set up some trades: I can write a Matlab -> Python converter if you'll do it for IDL. All that's needed is for each side to provide examples of code in the proprietary language that they have written themselves, along with descriptions of what it does and sample inputs and outputs. That plus commercial books on the languages should provide all the information we need.
For specific packages, it makes more sense for people to do them from scratch or by wrapping existing open-source code. For one thing, the efforts of the commercial packages are usually old and almost always procedural. Much is to be gained implementing as a good OO design based on the method descriptions in the literature and the particular strengths of scipy.base. Then, someone else who has not signed a license can do workalike interfaces for the commercial packages. The benefit here is that IDL and Matlab are pretty backward languages, whereas Python is quite modern. Matlab users may benefit from encountering a Matlab-compatible fuzzy logic interface, but IDL users certainly don't. It would be much better for them to learn the OO interface. The reverse is of course true for, say, IDL wavelets.
So, before anyone (else) goes diving into implementing something from IDL, Matlab, etc., please give consideration to these issues.
--jh--
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.net http://www.scipy.net/mailman/listinfo/scipy-dev
![](https://secure.gravatar.com/avatar/73f4e1ffd23622a339c1c9303615d7fe.jpg?s=120&d=mm&r=g)
"Joe" == Joe Harrington <jh@oobleck.astro.cornell.edu> writes:
Joe> sign that right away. So, it makes some sense to set up some Joe> trades: I can write a Matlab -> Python converter if you'll do Joe> it for IDL. All that's needed is for each side to provide Joe> examples of code in the proprietary language that they have Joe> written themselves, along with descriptions of what it does Joe> and sample inputs and outputs. That plus commercial books on Joe> the languages should provide all the information we need. Personally, I don't thinks this is a good use of manpower. matlab and IDL are both pretty crappy languages. That's why people don't want to use them. octave already provides a matlab clone (eg can run m-files) but a lot of people don't use it because it never works as well as the original but has all the faults of the original (except cost). Octave is always several versions behind -- an open source project simply can't keep up with the manpower behind matlab or IDL in terms of a feature-by-feature implementation. Users then get frustrated when they try and run their scripts and although a few scripts work, many don't. Look, we don't even have full matlab matfile support, much less the capability to clone matlab or IDL (and good luck with the matlab/IDL API, MEX files etc, on which many matlab extensions depend (eg wavelab). Providing a translator essentially creates false expectations and dissatisfied users. I think development effort should be put into making the python platform as compelling as possible, not in trying to emulate a suboptimal language in python. People can convert -- I used matlab for 8 years and had an enormous code base and just walked away. I never tried to run my mfiles in python, I just implemented the missing functionality in python. Granted, this means some users will never switch from IDL or matlab to python but that's fine, many will, especially if we spend our effort making better tools (eg a wavelet library for scipy) and providing documentation like "python for IDL users" and "python for matlab users". Give them things that they can easily do in python that are painful in matlab/IDL and most of them will bite the bullet. JDH
![](https://secure.gravatar.com/avatar/1982f0b30a415992c034dd77eb7224ab.jpg?s=120&d=mm&r=g)
Personally, I don't thinks this is a good use of manpower. matlab and IDL are both pretty crappy languages. That's why people don't want to use them.
I'll appeal to my extensive, if informal, surveys of the astronomy community. A lot of us have a lot more than just 8 years of code we depend on, and simply cannot switch until we can convert our most important codes. For example, several of the packages I now depend on for my research took me and about half a dozen students years of full-time work to write. Most of those students are now gone. I use these packages every day. I'd have to get new grants specifically to hire and supervise more students just to redo them so that I could do my daily work. That isn't going to happen, and I'm far from a unique case. You can tell us to go jump in a lake if you like (which you basically just did). Instead, we're going to take care of our own problem, and hopefully that of a lot of others. My goal is not to make IDL-in-python (gnudl tries, as does octave for matlab, both badly, as you note). It's to facilitate the conversion of code to Python. We can do an 80% solution without a great deal of effort. The idea is to create a code converter that would handle all of the procedural syntax, and which would flag in loud comments the things it couldn't do. Then the user would hand-code the remainder. We would not have a function compatibility library, though we could rig a way of describing the most common functions so that it could convert the calls for them. Nobody could possibly have an expectation that the code produced this way would run unmodified, so I don't think we'll make unsatisfied users. Rather, I think just the opposite: this package will be a great relief to many, as will its Matlab equivalent. My main concern is not for coding labor but rather for numerical accuracy. There are enough subtle differences between Python and IDL that hand-converting large amounts of numerical code would be error-prone. For example, in Python the end of an array slice is one element beyond the end of an IDL slice. An automatic converter would just add one to every slice's ending index. A human wouldn't necessarily remember to do that every time, producing subtle bugs could be very hard to find in some cases. --jh--
![](https://secure.gravatar.com/avatar/73f4e1ffd23622a339c1c9303615d7fe.jpg?s=120&d=mm&r=g)
"Joe" == Joe Harrington <jh@oobleck.astro.cornell.edu> writes: Joe> octave for matlab, both badly, as you note). It's to
Joe> facilitate the conversion of code to Python. We can do an Joe> 80% solution without a great deal of effort. The idea is to Joe> create a code converter that would handle all of the Joe> procedural syntax, and which would flag in loud comments the Joe> things it couldn't do. Then the user would hand-code the Joe> remainder. We would not have a function compatibility Is it clear that the EULA forbids translating a limited part of a user's IDL file into another syntax? That sounds like something different than reverse engineering. Do you have a copy of the IDL EULA? I've never seen it and couldn't find it via quick searches on google or on the RSI site. JDH
![](https://secure.gravatar.com/avatar/1982f0b30a415992c034dd77eb7224ab.jpg?s=120&d=mm&r=g)
Is it clear that the EULA forbids translating a limited part of a user's IDL file into another syntax?
No, a user's code is clearly not RSI's property. However, our lawyers made the point that a code converter could easily be configured as a front-end to python, and for any program it converted 100% successfully, it would be a reimplementation of IDL and thus would be prohibited by the reverse engineering clause. They felt that making even part of a partial reimplementation could be argued as being against the reverse engineering clause of the license, based on these (bogus, in my opinion) court precedents. At least, it wouldn't be grounds for immediate dismissal of a lawsuit. The only relevant statement in the license is: "You may not, however, reverse engineer, decompile or disassemble the Software." The full license may be read here: ftp://ftp.rsinc.com/pub/idl_6.1/info/license.txt Can someone post a pointer to Matlab's license? --jh--
![](https://secure.gravatar.com/avatar/ec366db3649cf13f4061b519193849d6.jpg?s=120&d=mm&r=g)
Joe Harrington wrote:
Is it clear that the EULA forbids translating a limited part of a user's IDL file into another syntax?
No, a user's code is clearly not RSI's property. However, our lawyers made the point that a code converter could easily be configured as a front-end to python, and for any program it converted 100% successfully, it would be a reimplementation of IDL and thus would be prohibited by the reverse engineering clause. They felt that making even part of a partial reimplementation could be argued as being against the reverse engineering clause of the license, based on these (bogus, in my opinion) court precedents. At least, it wouldn't be grounds for immediate dismissal of a lawsuit.
Could you ask your lawyers for the actual case citations? In any case, I believe that the usual way that professionals do this (legally, to my knowledge, but IANAL) is to work in two teams. One team actually looks at the software they are trying to emulate. They do not write any code whatsoever; they only write a specification. The other team does not touch the software-to-be-emulated in any way; they only implement from the specification written by the other team. If you're serious about doing these kinds of activities, consider this strategy and talk about it with your lawyers. -- Robert Kern rkern@ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter
![](https://secure.gravatar.com/avatar/c65e537418c534976be9316074c4bdb6.jpg?s=120&d=mm&r=g)
Joe Harrington wrote:
My main concern is not for coding labor but rather for numerical accuracy. There are enough subtle differences between Python and IDL that hand-converting large amounts of numerical code would be error-prone. For example, in Python the end of an array slice is one element beyond the end of an IDL slice. An automatic converter would just add one to every slice's ending index. A human wouldn't necessarily remember to do that every time, producing subtle bugs could be very hard to find in some cases.
This is my concern too. In the case of IDL, most algorithms are implemented in single precision floating point. The Python implementation by default would use double precision, unless we explicitly direct it to do otherwise. This problem alone can cause much grief, because the IDL version is presumed to be the correct one, until demonstrated otherwise. (I know this from personal experience.) So in addition to the language being crappy, do we also want to propagate crappy (i.e. unstable) algorithms too? -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218
![](https://secure.gravatar.com/avatar/1982f0b30a415992c034dd77eb7224ab.jpg?s=120&d=mm&r=g)
In the case of IDL, most algorithms are implemented in single precision floating point. The Python implementation by default would use double precision, unless we explicitly direct it to do otherwise. This problem alone can cause much grief, because the IDL version is presumed to be the correct one, until demonstrated otherwise. (I know this from personal experience.) So in addition to the language being crappy, do we also want to propagate crappy (i.e. unstable) algorithms too?
Paul, that argument involves both guilt by association and an implicit assumption that those scientists who *are* guilty will write better code the second time around. My experience and what I've been told directly by dozens of researchers is that they will not write code a second time, period. Their main reasons to switch are cost and freedom, not any unhappiness with IDL. The balance in their minds tips to IDL if the cost of rewriting is included. It tips back to Python if they have a tool that converts most of their code. As you point out, Python defaults to double, so that may improve things a little for some algorithms. I doubt it will make a big difference for most codes, as astronomical data tends to be uncertain no later than the fourth decimal place, and frequently in the first. As for the guilt by association, writing in IDL does not make an algorithm or its implementation bad, nor does writing in Python automatically make them any better. Quick-and-dirty, procedural scientists will write quick-and-dirty, procedural code in Python as well as in IDL. On the flip side, I have an algorithm for optimal spectrum extraction that I can demonstrate is the best available (paper with the first-ever comparison tests in prep). It's 4300 lines and took well over 1000 billable programmer-hours to write. If I can do an "80%" translation to Python automatically, spend a few weeks filling in the external function calls, and run my verification tests to see where the problems are and fix them, we will have a valuable algorithm that will attract users to Python. I don't have the resources to do it by hand, and I doubt anyone else will do it for me (would STScI?). In preparing our paper, we discovered either algorithmic deficiencies (your "crap") or serious bugs in every other code we tested, including popular ones like Piskunov's REDUCE and IRTF's SpeXtool. So, either we do a converter and you get this and many other tools that will attract scientists to Python, or we don't, and until Python becomes the dominant tool (if ever), people will be home-brewing their own codes for difficult algorithms like optimal spectrum extraction. And then you will have real "crap", as you put it, but only necessarily so in Python. This is just one of many examples. It's great that you at STScI are redoing the astron library from the ground up, but I doubt that you can commit to doing even all of that work (including astron/contrib). You certainly can't do everyone's personal codes, whereas a converter can make a big dent in that. Hopefully, that will be a big enough dent to induce people to switch. --jh--
![](https://secure.gravatar.com/avatar/80473ff660f57aa7f90affadd2240008.jpg?s=120&d=mm&r=g)
In the case of IDL, most algorithms are implemented in single precision floating point. The Python implementation by default would use double precision, unless we explicitly direct it to do otherwise.
I personally vote for single precision algorithms to stay single
Just a few comments. Joe Harrington wrote: precision if possible. The Fortran side of the numerical computation community has always felt that C's promotion of everything to double is one of its most ill-considered design decisions. Given that NumPy has single and double arrays, and Numeric 24 (like numarray) doesn't promote a single array to a double when it interacts with a double scalar, I'd think we could keep single precision in many contexts.
I doubt [single vs. double] will make a big difference for most codes, as astronomical data tends to be uncertain no later than the fourth decimal place, and frequently in the first.
pi**2=10 for most astronomers. I second the rest of Joe's comments. (That sounds like an impressive spectrum extraction code, by the way, Joe.) Still, I'd hope that a more feature-ful language would help the conversion as well. No one wrote a Fortran to C translator, at least not one which produced human-readable C, but I think more scientists are writing C than Fortran nowadays, and the reason is probably the big problems in Fortran 77. Fortran 90 fixed most of the problems but came along too late. (We at universities also all found our students were no longer being taught Fortran by our colleagues over in computer science.) So, maybe one effective approach would be to get some IDL and MATLAB users to list their biggest annoyances with the language and then see they don't face same in Scipy. Now if we could just get the lawyers out of our way so we could actually do some work.
participants (6)
-
Joe Harrington
-
John Hunter
-
Paul Barrett
-
Prasan Samtani
-
Robert Kern
-
Stephen Walton